Node.js puppeteer - How to fetch only certain (filter) records from a table

Node.js puppeteer - How to fetch only certain (filter) records from a table



I'm using node.js and puppeteer to get some data. From the targeted table I only want to fetch certain records though .... . More specific, records with innerText: 'file. ........ .idx'.



The below shows my query:


const tableRows = await page.$$('table > tbody tr');
console.log(tableRows.length);

let tableCell01;
let tableCell01Val;

for (let i=1; i < tableRows.length; i++)

tableRow = tableRows[i];
tableCell01 = await tableRow.$('td:nth-child(1) a');
tableCell01Val = await page.evaluate( tableCell01 => tableCell01.href, tableCell01 );

console.log('n');
console.log(tableCell01Val);




And here the output without the filtering is:



Console:


6

file.20180702.idx
file.20180703.idx
file.20180705.idx
sitemap.20180702.xml
sitemap.20180703.xml
sitemap.20180705.xml



So the desired result should be:



Console:


3

file.20180702.idx
file.20180703.idx
file.20180705.idx



What's the best way to do this? Best would be to filter already before the loop ... to also get the correct tableRows.length




2 Answers
2



You can use page.$x() to check the value of the href attribute with an XPath expression before selecting the rows:


page.$x()


href


const tableRows = await page.$x( '//table/tbody/tr/td[1]/a[starts-with(@href, "file.")]/../..' );



Result:


3

file.20180702.idx
file.20180703.idx
file.20180705.idx





I don't think it's necessary to use XPath for this. I prefer using CSS selectors instead.
– jnylen
Aug 31 at 18:19





@jnylen Looks like a perfect opportunity to me. No need to over-complicate things.
– Grant Miller
Aug 31 at 18:33





It may just be personal preference, but I like to use the "standard" query language built in to the browser, CSS, JS, etc, rather than adding another one.
– jnylen
Aug 31 at 18:39





@jnylen You are entitled to your personal preference.
– Grant Miller
Aug 31 at 18:53



I would use page.$$eval (evaluate a function against an array of elements matched by a selector). This will do all of the required operations in a single call to the browser.


page.$$eval



Pseudocode (assumes that all first-child tds have an a child):


td


a


const hrefArray = await page.$$eval( 'table > tbody tr', trs =>
return trs.map( tr =>
return tr.querySelector( 'td:nth-child(1) a' ).href;
).filter( href => /^file.*idx$/.test( href ) );
);



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Some of your past answers have not been well-received, and you're in danger of being blocked from answering.



Please pay close attention to the following guidance:



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ

Node.js puppeteer - Use values from array in a loop to cycle through pages