빅데이터/Scrapy
Python BeautifulSoup 기본 parser ver:python3
Petabyte
2018. 12. 23. 14:33
기본 parser 구조
http://www.pythonscraping.com/pages/page1.html 주소에서 Div 내역만 얻고자 할때.(기본)
from urllib.request import urlopen print 결과 <div> Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</ </div> 1 ㄸ |
except 를 활용하여 에러 표시 추가
from urllib.request import urlopen
from urllib.request import HTTPError
from bs4 import BeautifulSoup
try:
html = urlopen("http://www.pythonscraping.com/pages/page1.html")
except HTTPError as e:
#문재가 발생하였을경우 에러 표시
print(e)
else:
#프로그램을 계속 실행합니다.
bsObj = BeautifulSoup(html.read(), "html.parser")
print(bsObj.div)