Node.js puppeteer - How to fetch only certain (filter) records from a table
Node.js puppeteer - How to fetch only certain (filter) records from a table
I'm using node.js and puppeteer to get some data. From the targeted table I only want to fetch certain records though .... . More specific, records with innerText: 'file. ........ .idx'.
The below shows my query:
const tableRows = await page.$$('table > tbody tr');
console.log(tableRows.length);
let tableCell01;
let tableCell01Val;
for (let i=1; i < tableRows.length; i++)
tableRow = tableRows[i];
tableCell01 = await tableRow.$('td:nth-child(1) a');
tableCell01Val = await page.evaluate( tableCell01 => tableCell01.href, tableCell01 );
console.log('n');
console.log(tableCell01Val);
And here the output without the filtering is:
Console:
6
file.20180702.idx
file.20180703.idx
file.20180705.idx
sitemap.20180702.xml
sitemap.20180703.xml
sitemap.20180705.xml
So the desired result should be:
Console:
3
file.20180702.idx
file.20180703.idx
file.20180705.idx
What's the best way to do this? Best would be to filter already before the loop ... to also get the correct tableRows.length
2 Answers
2
You can use page.$x() to check the value of the href attribute with an XPath expression before selecting the rows:
page.$x()
href
const tableRows = await page.$x( '//table/tbody/tr/td[1]/a[starts-with(@href, "file.")]/../..' );
Result:
3
file.20180702.idx
file.20180703.idx
file.20180705.idx
@jnylen Looks like a perfect opportunity to me. No need to over-complicate things.
– Grant Miller
Aug 31 at 18:33
It may just be personal preference, but I like to use the "standard" query language built in to the browser, CSS, JS, etc, rather than adding another one.
– jnylen
Aug 31 at 18:39
@jnylen You are entitled to your personal preference.
– Grant Miller
Aug 31 at 18:53
I would use page.$$eval (evaluate a function against an array of elements matched by a selector). This will do all of the required operations in a single call to the browser.
page.$$eval
Pseudocode (assumes that all first-child tds have an a child):
td
a
const hrefArray = await page.$$eval( 'table > tbody tr', trs =>
return trs.map( tr =>
return tr.querySelector( 'td:nth-child(1) a' ).href;
).filter( href => /^file.*idx$/.test( href ) );
);
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I don't think it's necessary to use XPath for this. I prefer using CSS selectors instead.
– jnylen
Aug 31 at 18:19