How to scrape the text 64076 from Item model number using xpath expression

How to scrape the text 64076 from Item model number using xpath expression



I'm attempting to scrape the text 64076 next to Item model number: on this page using the following XPath expression:


//*[contains (@id,'productDetails')]//tr[contains(.,'Item model number')]/td|//*[contains (@id,'detail')]//descendant::li[contains(.,'Item model number')]/text() // I'm focusing mainly on second half of expression..



However, although this matches the expected text (64076) in Firebug it is not found when using Selenium WebDriver (Java).



When I change the XPath to:


//*[contains (@id,'productDetails')]//tr[contains(.,'Item model number')]/td|//*[contains (@id,'detail')]//descendant::li[contains(.,'Item model number')]



It works however it also scrapes the text Item model number: which I do not want (I know I could parse the result using regex but I'm trying to understand why my XPath is not working since I am clearly matching the actual text/number via text(), not the bold text)


text()



Thanks






Possible duplicate of using XPath: how to exclude text in nested elements

– shmosel
Sep 17 '18 at 1:51




6 Answers
6



It's because text() in XPath means to find TextNode, but for Selenium only support to find and return ElementNode. Also Attribute Node not supported by Selenium, but support in XPath.


text()



You have to find the parent(which is an ElementNode) of the TextNode, then use regex or split to extract you wanted sting.


String xpath = "//ul/li[b[text()='Item model number:']][contains(. , '64076')]"
driver.findElement(By.xpath(xpath)).getText().split()[1]



This is a common problem in selenium since it only supports XPath 1.0 which does not include text(). The usual approach is to get the node and call getText().


text()


getText()



Here is a nicely wrapped function to get the text without any text from the children:


public static String geNodeText(WebElement element)
String text = element.getText();
for (WebElement child : element.findElements(By.xpath("./*")))
text = text.replaceFirst(child.getText(), "");

return text;



Sure enough, you can use string functions or regex to extract the string in question as well. But this probably requires you to write custom extraction logic for each case.



You cannot use Selenium to get it directly because it is TextNode.
You may use JavaScript to check the text node and get it.


WebElement itemModelRootNode = driver.findElement(by.xpath("//*[contains (@id,'productDetails')]//tr[contains(.,'Item model number')]/td|//*[contains (@id,'detail')]//descendant::li[contains(.,'Item model number')]");

String script = "var t = ''; arguments[0].childNodes.forEach((node)=> if(node.nodeType==Node.TEXT_NODE && node.textContent.trim().length > 0) t = node.textContent.trim(); ); return t;"

String text = ((JavascriptExecutor)driver).executeScript(script, itemModelRootNode);



More in @Bauban Answer. Selenium doesn't allow to locate an element using text node. You can try with evaluate() method of JavaScript and evaluate your xpath using JavascriptExecutor


evaluate()


JavascriptExecutor



This is your xpath :


//div[@class='content']//li[contains(.,'Item model number:')]/text()



And this is how you can evaluate:


JavascriptExecutor js = (JavascriptExecutor)driver;
Object message = js.executeScript("var value = document.evaluate("//div[@class='content']//li[contains(.,'Item model number:')]/text()",document, null, XPathResult.STRING_TYPE, null ); return value.stringValue;");
System.out.println(message.toString().trim());



You can refer this link to get more details about evaluate function.



As per the url you have shared to extract the text 64076 next to Item model number: on this page as it is a Text Node you need to use WebDriverWait for the desired element to be visible and you can use the following solution:



Code Block:


import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;

public class q52359631_textExtract

public static void main(String args)
System.setProperty("webdriver.gecko.driver", "C:\Utility\BrowserDrivers\geckodriver.exe");
WebDriver driver = new FirefoxDriver();
driver.get("https://www.amazon.com/dp/B000TW3B9G/?tag=stackoverflow17-20");
WebElement myElement = new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//td[@class='bucket']//li/b[contains(.,'Item model number:')]/..")));
String myText = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].lastChild.textContent;", myElement);
System.out.println(myText);




Console Output:


64076



Try for Item model number: 64076 for the test URL


Item model number: 64076


var xpathExp =
"//h2[.='Product details']//parent::td//div[@class='content']/ul/li/b[contains(text(),'Item')]/parent::li/text()";
var ele = $x(xpathExp);
console.dir( ele ); // Array(1)
console.log( ele[0] ); //" 64076"



Test XML XPath online:


XPath online


<ul>
<li>
<b>Item model number:</b> 64076
</li>
</ul>



XML Tree View codebeautify//ul/li/b[contains(text(),'Item')]/parent::li/text()


//ul/li/b[contains(text(),'Item')]/parent::li/text()


ul ..
li 64076 ..
b Item model number:



html as javascript object


outerHTML:"<li><b>Item model number:</b> 64076</li>"
outerText:"Item model number: 64076"

tagName:"LI"
textContent:"Item model number: 64076"

lastChild:text
data: 64076"
nodeValue: 64076"
textContent: 64076"
wholeText: 64076"
lastElementChild:b



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you agree to our terms of service, privacy policy and cookie policy

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

How do I collapse sections of code in Visual Studio Code for Windows?

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ