Using selenium in python to get data from dynamic website: how to discover the way databases querys are done?
Using selenium in python to get data from dynamic website: how to discover the way databases querys are done?
I had some experience with coding before, but not specifically for web applications. I have been tasked with getting data from this website: http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-de-derivativos/precos-referenciais/taxas-referenciais-bm-fbovespa/
They are avaliable on a day-to-day basis. I have used selenium in Python, and so far the results are good: I can get the entire table, store it in a pandas dataframe, and then to a mysql database and stuff. The problem is: the result from the website is always the same!
Here is my code:
from selenium import webdriver
from bs4 import BeautifulSoup as bs
import time
def GetDataFromWeb(day, month, year):
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1920x1080')
#had to use these two below because of webdriver crashing issues
options.add_argument('no-sandbox')
options.add_argument('disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=options)
driver.get("http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-de-derivativos/precos-referenciais/taxas-referenciais-bm-fbovespa/")
#the table is on an iframe
iframe = driver.find_element_by_id("bvmf_iframe")
driver.switch_to.default_content()
driver.switch_to.frame(iframe)
#getting to the place where I should input the data
date = driver.find_element_by_id("Data")
date.send_keys("/".join((str(day),str(month),str(year))))
date = driver.find_element_by_tag_name("button").click()
#I have put this wait just to be sure it doesn't try to get info from an unloaded page
time.sleep(5)
page = bs(driver.page_source,"html.parser")
table = page.find(id='tb_principal1')
headers = ['Dias Corridos', '252','360']
matrix =
for rows in table.select('tr')[2:]:
values =
for columns in rows.select('td'):
values.append(columns.text.replace(',','.'))
matrix.append(values)
df = pd.DataFrame(data=matrix, columns=headers)
driver.close()
#only the first 2 columns are interesting for my purposes
return df.iloc[:,0:2]
The table resulting from this function is always the same, no matter what inputs I send to it. And they seem to be from the corresponding date of 06/09/2018 (month=09,day=06). I think the main problem is that I don't know how the queries to their database is done, so this always runs like a "default date". I have read some people talking about Ajax and JavaScript requests, but I don't know if that's the case. How can I tell?
1 Answer
1
This code will work(updated few lines in your code)
from selenium import webdriver
from bs4 import BeautifulSoup as bs
import time
import pandas as pd
def GetDataFromWeb(day, month, year):
***#to avoid data error in date handler***
if month < 10:
month="0"+str(month)
if day < 10:
day="0"+str(day)
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1920x1080')
#had to use these two below because of webdriver crashing issues
options.add_argument('no-sandbox')
options.add_argument('disable-dev-shm-usage')
driver = webdriver.Chrome(chrome_options=options)
driver.get("http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-de-derivativos/precos-referenciais/taxas-referenciais-bm-fbovespa/")
#the table is on an iframe
iframe = driver.find_element_by_id("bvmf_iframe")
driver.switch_to.default_content()
driver.switch_to.frame(iframe)
#getting to the place where I should input the data
date = driver.find_element_by_id("Data")
date.clear() ***#to clear auto populated data***
date.send_keys(((str(day),str(month),str(year)))) ***# removed the join part***
driver.find_element_by_tag_name("button").click()
#I have put this wait just to be sure it doesn't try to get info from an unloaded page
time.sleep(50)
page = bs(driver.page_source,"html.parser")
table = page.find(id='tb_principal1')
headers = ['Dias Corridos', '252','360']
matrix =
for rows in table.select('tr')[2:]:
values =
for columns in rows.select('td'):
values.append(columns.text.replace(',','.'))
matrix.append(values)
df = pd.DataFrame(data=matrix, columns=headers)
driver.close()
#only the first 2 columns are interesting for my purposes
return df.iloc[:,0:2]
print GetDataFromWeb(3,9,2018)
It will print the matching data for the required date.
I have added #to avoid data error in date handler
if month < 10:
month="0"+str(month)
if day < 10:
day="0"+str(day)
date.clear()
#to clear auto populated data
date.send_keys(((str(day),str(month),str(year))))
# removed the join part
date.clear()
date.send_keys(((str(day),str(month),str(year))))
Note the problem in your code was the date& month fields take two digit number and date.send_keys("/".join((str(day), str(month), str(year))))
line was generating an error because of which the system date was picked and you always see same data coming for any input data. Also when you click on the date it was picking default date so first, we have to clear that and send custom date. Hope this helps
date.send_keys("/".join((str(day), str(month), str(year))))
Update for additional query: Add these imports
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
Add this line in place of wait
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR,'#divContainerIframeBmf > form > div > div > div:nth-child(1) > div:nth-child(3) > div > div > p')))
add this line inplace of sleep WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR,'#divContainerIframeBmf > form > div > div > div:nth-child(1) > div:nth-child(3) > div > div > p'))) with these imports from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By
– thebadguy
Sep 13 '18 at 6:07
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thank you very much! If it's not too much to ask, could you also give me at least a hint about how can I change this "time.sleep(50)" line? I wanted to use webdriver.wait, with the condition being the website's database query is completed, but I don't know exactly how it does that. Is there a "jquery done" or something like that?
– Guilherme Moreira Barbosa
Sep 13 '18 at 5:45