BeautifulShop 크롤링

Petabyte 2020. 1. 5. 15:19

2020. 1. 5. 15:19

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup

def getTitle(url):

    try:
        html = urlopen(url)
    except HTTPError as e:
        print(e)
        pass

    try:
        bsObj = BeautifulSoup(html.read(),"html.parser")
        #span class 가 red 이거나 green 인 경우
        #nameList = bsObj.findAll("span",{"class":{"green","red"}})
        #for name in nameList:
        #    content = name.get_text()
        #    print(content)

        # text 중에 the prince 가 몇번 들어가있는지 len 으로 갯수 확인
        #nameList = bsObj.findAll(text="the prince")
        #print(len(nameList))


        #id 가 text 인 값을 모두 찾아서 출력. 배열로 들어가기떄문에 선택적 출력을 할경우는 배열번지 선택
        allText = bsObj.findAll(id='text')
        print(allText)
        #print(allText[0].get_text())

    except AttributeError as e:
        print(e)
        pass


print('Search Start=================')
url = 'http://www.pythonscraping.com/pages/warandpeace.html'

getTitle(url)


print('Search End=================')

저작자표시 비영리 변경금지

'빅데이터 > Scrapy' 카테고리의 다른 글

Python BeautifulSoup 기본 parser ver:python3 (0)	2018.12.23
Xpath 사용법 (0)	2018.08.25

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

PETABYTE

BeautifulShop 크롤링

'빅데이터 > Scrapy' 카테고리의 다른 글

+ Recent posts

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역