Beautifulsoup XML 분석

AI, ML, DL 2019. 1. 15. 22:47 |

Beautifulsoup을 이용해 XML을 분석할 수 있다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
from bs4 import BeautifulSoup
import urllib.request as req
import os.path
 
url = "http://www.weather.go.kr/weather/forecast/mid-term-rss3.jsp?stnId=108"
# 기상청 날씨누리 전국 중기예보 RSS
filename = "forecast.xml"
 
if not os.path.exists(filename):
    #req.urlretrieve(url, savename)
    # Legacy interface. It might become deprecated at some point in the future.
    with req.urlopen(url) as contents:
        xml = contents.read().decode("utf-8")
        # If the end of the file has been reached, read() will return an empty string ('').
        #print(xml)
        with open(filename, mode="wt") as f:
            f.write(xml)
 
with open(filename, mode="rt") as f:
    xml = f.read()
    soup = BeautifulSoup(xml, "html.parser")
    # html.parser는 모든 태그를 소문자로 바꾼다.
    #print(soup)
 
    print("[", soup.find("title").string, "]")
    print(soup.find("wf").string, "\n")
 
    # 날씨에 따른 지역 분류
    info = {}    # empty dicionary
    for location in soup.find_all("location"):
        name = location.find("city").string
        weather = location.find("wf").string
        if not (weather in info):
            info[weather] = []    # empty list. dictionary는 list를 value로 가질 수 있다.
        info[weather].append(name)
 
    for weather in info.keys():    # Return a new view of the dictionary’s keys.
        print("■", weather)
        for name in info[weather]:
            print("|-", name)
Colored by Color Scripter
cs

저작자표시 비영리 변경금지

'AI, ML, DL' 카테고리의 다른 글

OCR with Tesseract on Windows - Windows에서 테서랙트 사용하기 (0)	2020.10.07
CSV 분석 (0)	2019.01.20
JSON 분석 (0)	2019.01.18
[Scraping] Selenium으로 로그인이 필요한 싸이트 정보 가져오기 (0)	2019.01.01
[Scraping] Naver '이 시각 주요 뉴스' 목록 가져 오기 (0)	2018.12.30

Posted by J-sean

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Software Engineer English & Software Engineering Blog - Sean

Category

Recent Posts

Recent Comments

Tags

Beautifulsoup XML 분석

'AI, ML, DL' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역