기본 parser 구조 


http://www.pythonscraping.com/pages/page1.html 주소에서 Div 내역만 얻고자 할때.(기본)



from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.pythonscraping.com/pages/page1.html")
bsObj = BeautifulSoup(html.read(), "html.parser")
print(bsObj.div)

print 결과

<div>

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</

</div>

1



except 를 활용하여 에러 표시 추가

from urllib.request import urlopen
from urllib.request import HTTPError
from bs4 import BeautifulSoup

try:
html = urlopen("http://www.pythonscraping.com/pages/page1.html")
except HTTPError as e:
#문재가 발생하였을경우 에러 표시
print(e)
else:
#프로그램을 계속 실행합니다.
bsObj = BeautifulSoup(html.read(), "html.parser")
print(bsObj.div)

'빅데이터 > Scrapy' 카테고리의 다른 글

BeautifulShop 크롤링  (0) 2020.01.05
Xpath 사용법  (0) 2018.08.25

+ Recent posts