Using selenium in python to get data from dynamic website: how to discover the way databases querys are done?

Using selenium in python to get data from dynamic website: how to discover the way databases querys are done?



I had some experience with coding before, but not specifically for web applications. I have been tasked with getting data from this website: http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-de-derivativos/precos-referenciais/taxas-referenciais-bm-fbovespa/



They are avaliable on a day-to-day basis. I have used selenium in Python, and so far the results are good: I can get the entire table, store it in a pandas dataframe, and then to a mysql database and stuff. The problem is: the result from the website is always the same!



Here is my code:


from selenium import webdriver
from bs4 import BeautifulSoup as bs
import time
def GetDataFromWeb(day, month, year):
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1920x1080')
#had to use these two below because of webdriver crashing issues
options.add_argument('no-sandbox')
options.add_argument('disable-dev-shm-usage')

driver = webdriver.Chrome(chrome_options=options)

driver.get("http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-de-derivativos/precos-referenciais/taxas-referenciais-bm-fbovespa/")

#the table is on an iframe
iframe = driver.find_element_by_id("bvmf_iframe")
driver.switch_to.default_content()
driver.switch_to.frame(iframe)

#getting to the place where I should input the data
date = driver.find_element_by_id("Data")
date.send_keys("/".join((str(day),str(month),str(year))))
date = driver.find_element_by_tag_name("button").click()

#I have put this wait just to be sure it doesn't try to get info from an unloaded page
time.sleep(5)

page = bs(driver.page_source,"html.parser")

table = page.find(id='tb_principal1')

headers = ['Dias Corridos', '252','360']

matrix =
for rows in table.select('tr')[2:]:
values =
for columns in rows.select('td'):
values.append(columns.text.replace(',','.'))
matrix.append(values)

df = pd.DataFrame(data=matrix, columns=headers)

driver.close()

#only the first 2 columns are interesting for my purposes
return df.iloc[:,0:2]



The table resulting from this function is always the same, no matter what inputs I send to it. And they seem to be from the corresponding date of 06/09/2018 (month=09,day=06). I think the main problem is that I don't know how the queries to their database is done, so this always runs like a "default date". I have read some people talking about Ajax and JavaScript requests, but I don't know if that's the case. How can I tell?




1 Answer
1



This code will work(updated few lines in your code)


from selenium import webdriver
from bs4 import BeautifulSoup as bs
import time
import pandas as pd
def GetDataFromWeb(day, month, year):

***#to avoid data error in date handler***
if month < 10:
month="0"+str(month)
if day < 10:
day="0"+str(day)

options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1920x1080')
#had to use these two below because of webdriver crashing issues
options.add_argument('no-sandbox')
options.add_argument('disable-dev-shm-usage')

driver = webdriver.Chrome(chrome_options=options)

driver.get("http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-de-derivativos/precos-referenciais/taxas-referenciais-bm-fbovespa/")

#the table is on an iframe
iframe = driver.find_element_by_id("bvmf_iframe")
driver.switch_to.default_content()
driver.switch_to.frame(iframe)

#getting to the place where I should input the data
date = driver.find_element_by_id("Data")
date.clear() ***#to clear auto populated data***
date.send_keys(((str(day),str(month),str(year)))) ***# removed the join part***

driver.find_element_by_tag_name("button").click()

#I have put this wait just to be sure it doesn't try to get info from an unloaded page
time.sleep(50)

page = bs(driver.page_source,"html.parser")

table = page.find(id='tb_principal1')

headers = ['Dias Corridos', '252','360']

matrix =
for rows in table.select('tr')[2:]:
values =
for columns in rows.select('td'):
values.append(columns.text.replace(',','.'))
matrix.append(values)

df = pd.DataFrame(data=matrix, columns=headers)

driver.close()

#only the first 2 columns are interesting for my purposes
return df.iloc[:,0:2]

print GetDataFromWeb(3,9,2018)



It will print the matching data for the required date.



I have added #to avoid data error in date handler


if month < 10:
month="0"+str(month)
if day < 10:
day="0"+str(day)



date.clear() #to clear auto populated data
date.send_keys(((str(day),str(month),str(year)))) # removed the join part


date.clear()


date.send_keys(((str(day),str(month),str(year))))



Note the problem in your code was the date& month fields take two digit number and date.send_keys("/".join((str(day), str(month), str(year)))) line was generating an error because of which the system date was picked and you always see same data coming for any input data. Also when you click on the date it was picking default date so first, we have to clear that and send custom date. Hope this helps


date.send_keys("/".join((str(day), str(month), str(year))))



Update for additional query: Add these imports


from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By



Add this line in place of wait


WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR,'#divContainerIframeBmf > form > div > div > div:nth-child(1) > div:nth-child(3) > div > div > p')))






Thank you very much! If it's not too much to ask, could you also give me at least a hint about how can I change this "time.sleep(50)" line? I wanted to use webdriver.wait, with the condition being the website's database query is completed, but I don't know exactly how it does that. Is there a "jquery done" or something like that?

– Guilherme Moreira Barbosa
Sep 13 '18 at 5:45






add this line inplace of sleep WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR,'#divContainerIframeBmf > form > div > div > div:nth-child(1) > div:nth-child(3) > div > div > p'))) with these imports from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By

– thebadguy
Sep 13 '18 at 6:07




Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)