'Scraping' 태그의 글 목록

[Scraping] 환율 정보

AI, ML, DL 2024. 1. 2. 19:03 |

환율 정보를 스크랩 해 보자.

크롬에서 F12를 누르면 위와 같은 정보가 표시된다. 정보를 태그와 클래스명으로 구분할 수 있을거 같다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

import urllib.request
from bs4 import BeautifulSoup as bs
 
url = "https://m.stock.naver.com/marketindex/home/exchangeRate/exchange"
with urllib.request.urlopen(url) as response:
    html = response.read().decode('utf-8')
 
soup = bs(html, 'html.parser')
 
all_countries = soup.find_all('strong', 'MainListItem_name__2Nl6J')
all_rates = soup.find_all('span', 'MainListItem_price__dP8R6')
 
for country, rate in zip(all_countries, all_rates):
    print(country.string + ': ', rate.string)
 
#for i, c in enumerate(all_countries):
#    print(i+1, c.string)
 
#for i, r in enumerate(all_rates):
#    print(i+1, r.string)
 
#print(soup.find('strong', 'MainListItem_name__2Nl6J').string)
#print(soup.find('span', 'MainListItem_price__dP8R6').string)

 

같은 클래스명을 쓰는 정보가 여러개 있다. 모두 검색하여 표시한다.

저작자표시 비영리 변경금지

'AI, ML, DL' 카테고리의 다른 글

[ML] MNIST pandas (0)	2024.12.21
[Scraping] 환율 정보를 SMS로 보내기 (3)	2024.01.02
OCR with Tesseract on Windows - Windows에서 테서랙트 사용하기 (0)	2020.10.07
CSV 분석 (0)	2019.01.20
JSON 분석 (0)	2019.01.18

Posted by J-sean

:

[Scraping] Selenium으로 로그인이 필요한 싸이트 정보 가져오기

AI, ML, DL 2019. 1. 1. 16:02 |

Selenium과 Chrome driver를 이용해 웹싸이트에서 로그인이 필요한 정보를 가져 올 수 있다.

아래 코드는 인터넷 서점(Yes24)에 접속해서 로그인 하고 주문 내역을 가져 온다.

Chrome이 동작하는걸 보고 싶다면 options 관련 부분만 삭제하면 된다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

from selenium import webdriver
 
login_url = "https://www.yes24.com/Templates/FTLogin.aspx"
id = "아이디"
password = "패스워드"
 
options = webdriver.ChromeOptions()
options.add_argument("headless") # headless 모드
options.add_argument("window-size=1920x1080") # headless 모드가 아니면 의미가 없는거 같다.
options.add_argument("--log-level=3") # 이 코드가 없으면 headless 모드시 log가 많이 표시 된다.
 
#driver = webdriver.Chrome("c:\download\chromedriver.exe", options = options)
driver = webdriver.Chrome(options = options) # webdriver와 소스가 같은 폴더에 있을 때.
driver.implicitly_wait(3)
driver.get(login_url)
print("Title: %s\nLogin URL: %s" %(driver.title, driver.current_url))
 
id_elem = driver.find_element_by_id("SMemberID")
id_elem.clear()
id_elem.send_keys(id)
 
driver.find_element_by_id("SMemberPassword").clear()
driver.find_element_by_id("SMemberPassword").send_keys(password)
# clear()와 send_keys()는 None을 리턴 한다.
 
login_btn = driver.find_element_by_xpath('//*[@id="btnLogin"]')
login_btn.click() # None 리턴
# driver.find_element_by_xpath('//*[@id="btnLogin"]').click()
 
list_url = "https://ssl.yes24.com//MyPageOrderList/MyPageOrderList"
driver.get(list_url)
 
print("-" * 30, "Order list", "-" * 30)
lists = driver.find_elements_by_css_selector("#MyOrderListTbl span")
for i, item in enumerate(lists):
    print("%d: %s" %(i + 1, item.text))
 
driver.save_screenshot("screenshot.png")
 
#driver.quit()
Colored by Color Scripter

cs

38번 라인에서 캡쳐한 'screenshot.png'

※ Selenium Documentation

저작자표시 비영리 변경금지

'AI, ML, DL' 카테고리의 다른 글

OCR with Tesseract on Windows - Windows에서 테서랙트 사용하기 (0)	2020.10.07
CSV 분석 (0)	2019.01.20
JSON 분석 (0)	2019.01.18
Beautifulsoup XML 분석 (0)	2019.01.15
[Scraping] Naver '이 시각 주요 뉴스' 목록 가져 오기 (0)	2018.12.30

Posted by J-sean

:

[Scraping] Naver '이 시각 주요 뉴스' 목록 가져 오기

AI, ML, DL 2018. 12. 30. 19:07 |

urllib와 beautifulsoup을 이용해 Naver에서 '이 시각 주요 뉴스' 목록을 가져온다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from bs4 import BeautifulSoup as bs
import urllib.request as req
 
url = 'https://news.naver.com/'
res = req.urlopen(url)
 
soup = bs(res, 'html.parser')
title1 = soup.find('h4', 'tit_h4').string # 'tit_h4 tit_main1'
# title1 = soup.find(attrs = {'class' : 'tit_h4 tit_main1'}).string
print('\t\tNAVER', title1, '\n') 
 
print('-'*20, 'find_all()', '-'*20)
headlines1 = soup.find_all('a', 'nclicks(hom.headcont)')
# headlines1 = soup.find_all(attrs = {'class' : 'nclicks(hom.headcont)'})
# The find_all() method looks through a tag’s descendants and
# retrieves all descendants that match your filters.
for i, news in enumerate(headlines1):
    #if news.string == None:
    #    print('%2d: None' %i)
    #    continue
    print('%2d: %s' %(i + 1, news.string))
 
print('-'*20, 'select()', '-'*20)
headlines2 = soup.select('div.newsnow_tx_inner > a')
# Beautiful Soup supports the most commonly-used CSS selectors. Just pass a
# string into the .select() method of a Tag object or the BeautifulSoup object itself.
for i, news in enumerate(headlines2):
    #if news.string == None:
    #    print('%2d: None' %i)
    #    continue
    print('%2d: %s' %(i + 1, news.string))
Colored by Color Scripter
cs

저작자표시 비영리 변경금지

'AI, ML, DL' 카테고리의 다른 글

OCR with Tesseract on Windows - Windows에서 테서랙트 사용하기 (0)	2020.10.07
CSV 분석 (0)	2019.01.20
JSON 분석 (0)	2019.01.18
Beautifulsoup XML 분석 (0)	2019.01.15
[Scraping] Selenium으로 로그인이 필요한 싸이트 정보 가져오기 (0)	2019.01.01

Posted by J-sean

:

Software Engineer English & Software Engineering Blog - Sean

Category

Recent Posts

Recent Comments

Tags

'Scraping'에 해당되는 글 3건

[Scraping] 환율 정보

'AI, ML, DL' 카테고리의 다른 글

[Scraping] Selenium으로 로그인이 필요한 싸이트 정보 가져오기

'AI, ML, DL' 카테고리의 다른 글

[Scraping] Naver '이 시각 주요 뉴스' 목록 가져 오기

'AI, ML, DL' 카테고리의 다른 글

티스토리툴바