selenium模块

selenium模块的使用

selenium模块的作用是自动化控制浏览器，我们也可以通过控制浏览器的行为去进行数据的爬取

如何使用

1.安装selenium模块
pip3 install selenium
2.找到与你浏览器对应的版本的驱动，下载谷歌驱动
#驱动下载地址
https://npm.taobao.org/mirrors/chromedriver/
3.下载好驱动以后放到你的程序中即可

selenium基本使用

from selenium import webdriver #导入一个加载驱动的模块

url='https://www.pixiv.net/users/30330894/following'
chrome = webdriver.Chrome(executable_path='chromedriver.exe') #加载谷歌驱动
chrome.get(url) #向url发起请求

chrome.page_source #获得页面数据,可以使用lxml将获得的页面数据解析

	  .quit退出浏览器，并且进关闭ChromeDriver进程
      .close若是在只有当前页面的情况下会关闭整个浏览器，但是后台的ChromeDriver依然会运行

selenium确定标签位置的方式

1.通过xpath定位
    .find_element_by_xpath("xpath") #使用该方法的话可以通过xpath找到需要的元素，返回单个元素
    .find_elements_by_xpath("xpath") #找到某些标签，返回的是一个列表 

2.通过id定位
    .find_element_by_id()
    .find_element_by_id()

3.通过name定位
	.find_element_by_name()
    .find_elements_by_name()

4.通过class定位
	.find_element_by_class_name(self, name)
    .find_elements_by_class_name(self, name)
    
5.通过标签名称定位
	.find_element_by_tag_name(self, name) 
    .find_elements_by_tag_name(self, name)

6.通过link定位： 
	.find_element_by_link_text(self, link_text)
    .find_elements_by_link_text(self, link_text)
    
7.通过css选择器定位
	.find_element_by_css_selector(self, css_selector）
    .find_elements_by_css_selector(self, css_selector）

元素操作

当你找到标签后可以对其进行一定的操作

0.确定某个标签在浏览器的某个位置
	.location 
		eg:res = chrome.find_elements_by_xpath('/html/body/div/div[2]/div[1]/div/a[1]')[0].location 
    		>>>{'x': 15, 'y': 72}

1.click操作
	Chrome.click()

2.clear
    chrome.clear()

3.send_keys()    
	chrome.send_keys(data)#往某个标签内输入值

4.text
	chrome.text #获得标签内文本的值
    
5.get_attribute()
	element.get_attribute(value)#获得属性值

selenium无头浏览器操作

from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')

self.chrome = webdriver.Chrome(chrome_options=chrome_options) #添加该属性

selenium规避被检测

from selenium.webdriver.chrome.options import Options
option = webdriver.ChromeOptions()
option.add_experimental_option('excludeSwitches',['enable-automation'])
self.chrome = webdriver.Chrome(options=option)

selenium切换iframe

有些网页会使用iframe做到在页面中嵌套页面的情况，遇到这种情况我们是没有办法直接获得子页面的数据的，这种时候我们需要先进行切换，然后在获取数据
1.有id，并且唯一，直接写id
driver.switch_to_frame("x-URS-iframe")
driver.switch_to.frame("x-URS-iframe")

2.有name，并且唯一，直接写name
driver.switch_to_frame("xxxx")
driver.switch_to.frame("xxxx")

3.无id，无name,先定位iframe元素
iframe = driver.find_elements_by_tag_name("iframe")[0]
driver.switch_to_frame(iframe)
driver.switch_to.frame(iframe)

4.使用索引的方式直接获取iframe
driver.switch_to_frame(0)#获取第一个iframe对象