Download file using HtmlUnit

Download file using HtmlUnit



I am trying to download xls file for a website. When I click the link to download the file, I get a javascript confirm box. I handle it like below


ConfirmHandler okHandler = new ConfirmHandler()
public boolean handleConfirm(Page page, String message)
return true;

;
webClient.setConfirmHandler(okHandler);



There is a link to download file.


<a href="./my_file.php?mode=xls&amp;w=d2hlcmUgc2VsbElkPSd3b3JsZGNvbScgYW5kIHN0YXR1cz0nV0FJVERFTEknIGFuZCBkYXRlIDw9IC0xMzQ4MTUzMjAwICBhbmQgZGF0ZSA%2BPSAtMTM1MDgzMTU5OSA%3D" target="actionFrame" onclick="return confirm('Do you want do download XLS file?')"><u>Download</u></a>



I click the link using


HTMLPage x = webClient.getPage("http://working.com/download");
HtmlAnchor anchor = (HtmlAnchor) x.getFirstByXPath("//a[@target='actionFrame']");
anchor.click();



handeConfirm() method is excuted. But I have no idea how to save the file stream from server. I tried to see the stream with code below.


anchor.click().getWebResponse().getContentAsString();



But, the result is same as the page x. Anyone knows how to capture the stream from server? Thank you.





anchor.click() will return a page. That should contian your XLS file
– Lee
Jan 9 '13 at 3:46



anchor.click()





see my answer to a similar question at stackoverflow.com/a/28471835/612123
– culmat
Feb 12 '15 at 7:39




6 Answers
6



I found a way to get InputStream using WebWindowListener. Inside of webWindowContentChanged(WebWindowEvent event), I put code below.


InputStream xls = event.getWebWindow().getEnclosedPage().getWebResponse().getContentAsStream();



After I get xls, I could save the file into my hard disk.





I am downloading a csv file, can you pls explain what is event and when are you calling the click event on anchor. I dont have confirmation box for downloading file.
– Naveen
Sep 22 '13 at 7:39




I made it based on your post.. Note: you can change content-type condition for download only specific type of file. eg.( application/octect-stream, application/pdf, etc).


package net.s4bdigital.export.main;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.List;

import org.junit.Before;
import org.junit.Test;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.htmlunit.HtmlUnitDriver;

import com.gargoylesoftware.htmlunit.ConfirmHandler;
import com.gargoylesoftware.htmlunit.Page;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.WebResponse;
import com.gargoylesoftware.htmlunit.WebWindowEvent;
import com.gargoylesoftware.htmlunit.WebWindowListener;
import com.gargoylesoftware.htmlunit.util.NameValuePair;

public class HtmlUnitDownloadFile

protected String baseUrl;
protected static WebDriver driver;

@Before
public void openBrowser()
baseUrl = "http://localhost/teste.html";
driver = new CustomHtmlUnitDriver();
((HtmlUnitDriver) driver).setJavascriptEnabled(true);




@Test
public void downloadAFile() throws Exception

driver.get(baseUrl);
driver.findElement(By.linkText("click to Downloadfile")).click();



public class CustomHtmlUnitDriver extends HtmlUnitDriver

// This is the magic. Keep a reference to the client instance
protected WebClient modifyWebClient(WebClient client)


ConfirmHandler okHandler = new ConfirmHandler()
public boolean handleConfirm(Page page, String message)
return true;

;
client.setConfirmHandler(okHandler);

client.addWebWindowListener(new WebWindowListener()

public void webWindowOpened(WebWindowEvent event)
// TODO Auto-generated method stub



public void webWindowContentChanged(WebWindowEvent event)

WebResponse response = event.getWebWindow().getEnclosedPage().getWebResponse();
System.out.println(response.getLoadTime());
System.out.println(response.getStatusCode());
System.out.println(response.getContentType());

List<NameValuePair> headers = response.getResponseHeaders();
for(NameValuePair header: headers)
System.out.println(header.getName() + " : " + header.getValue());


// Change or add conditions for content-types that you would to like
// receive like a file.
if(response.getContentType().equals("text/plain"))
getFileResponse(response, "target/testDownload.war");






public void webWindowClosed(WebWindowEvent event)




);

return client;





public static void getFileResponse(WebResponse response, String fileName)

InputStream inputStream = null;

// write the inputStream to a FileOutputStream
OutputStream outputStream = null;

try

inputStream = response.getContentAsStream();

// write the inputStream to a FileOutputStream
outputStream = new FileOutputStream(new File(fileName));

int read = 0;
byte bytes = new byte[1024];

while ((read = inputStream.read(bytes)) != -1)
outputStream.write(bytes, 0, read);


System.out.println("Done!");

catch (IOException e)
e.printStackTrace();
finally
if (inputStream != null)
try
inputStream.close();
catch (IOException e)
e.printStackTrace();


if (outputStream != null)
try
// outputStream.flush();
outputStream.close();
catch (IOException e)
e.printStackTrace();












I m sorry but I dont get it, where or how exactly are you keeping the reference to webclient in modifywebclient method......thanks
– Anudeep Samaiya
Sep 12 '15 at 9:03


webclient


modifywebclient





selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/… Anudeep Samaiya Is a method of superclass.. we can override it adding a handle to confirm windows of download file.. But u need modify content type waited in your case.
– Eduardo Fabricio
Sep 13 '15 at 1:08






Really it does a magic..Works smoothly.
– viralpatel
Jan 5 '17 at 9:43





I have faced one problem like it downloads the file but not the complete. Content of the file is half.
– viralpatel
Mar 9 '17 at 10:42





@viralpatel I never faced it , however I have a clue, did you already verified the "Content-Length" header in http response in our especific case ? is it correct ?
– Eduardo Fabricio
Mar 10 '17 at 17:57



There's an easier way if you're not into wrapping HtmlUnit with Selenium. Simply provide HtmlUnit's WebClient with the extended WebWindowListener.



You could also use Apache commons.io for easy stream copying.


WebClient webClient = new WebClient();
webClient.addWebWindowListener(new WebWindowListener()
public void webWindowOpened(WebWindowEvent event)

public void webWindowContentChanged(WebWindowEvent event)
// Change or add conditions for content-types that you would
// to like receive like a file.
if (response.getContentType().equals("text/plain"))
try
IOUtils.copy(response.getContentAsStream(), new FileOutputStream("downloaded_file"));
catch (FileNotFoundException e)
e.printStackTrace();
catch (IOException e)
e.printStackTrace();





public void webWindowClosed(WebWindowEvent event)
);





how to get response in webWindowContentChanged method?
– Vahe Harutyunyan
May 20 at 13:41


final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setTimeout(2000);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.waitForBackgroundJavaScript(2000);

//get General page
final HtmlPage page = webClient.getPage("http://your");

//get Frame
final HtmlPage frame = ((HtmlPage)
page.getFrameByName("Frame").getEnclosedPage());

webClient.setConfirmHandler(new ConfirmHandler()
public boolean handleConfirm(Page page, String message)
return true;

);

//get element file
final DomElement file = mainFrame.getElementByName("File");

final InputStream xls = file.click().getWebResponse().getContentAsStream();

assertNotNull(xls);
}



Expanding on Roy's answer, here's my solution to this problem:


public static void prepareForDownloadingFile(WebClient webClient, File output)
webClient.addWebWindowListener(new WebWindowListener()

public void webWindowOpened(WebWindowEvent event)


public void webWindowContentChanged(WebWindowEvent event)
Page page = event.getNewPage();
FileOutputStream fos = null;
InputStream is = null;
if (page != null && page instanceof UnexpectedPage)
try
fos = new FileOutputStream(output);
UnexpectedPage uPage = (UnexpectedPage) page;
is = uPage.getInputStream();
IOUtils.copy(is, fos);
webClient.removeWebWindowListener(this);
catch (Exception e)
e.printStackTrace();
finally
try
if (fos != null)
fos.close();
if (is != null)
is.close();
catch (IOException e)
e.printStackTrace();






public void webWindowClosed(WebWindowEvent event)

);



I felt there were enough differences to make it a new answer:

-Doesn't have a magic variable (response)

-Closes InputStream and FileOutputStream

-Looks for UnexpectedPage to determine we're not on a HTML page

-Downloads a file one time after requesting then removes itself

-Doesn't require knowing the ContentType


response


InputStream


FileOutputStream


UnexpectedPage


ContentType



Calling this once before, for example, clicking a button that initiates a download, will download that file.



Figure out the download URL, and scrape it in List. from the download url we can get the entire file using this code.


try
String path = "your destination path";
List<HtmlElement> downloadfiles = (List<HtmlElement>) page.getByXPath("the tag you want to scrape");
if (downloadfiles.isEmpty())
System.out.println("No items found !");
else
for (HtmlElement htmlItem : downloadfiles)
String DownloadURL = htmlItem.getHrefAttribute();

Page invoicePdf = client.getPage(DownloadURL);
if (invoicePdf.getWebResponse().getContentType().equals("application/pdf"))
System.out.println("creatign PDF:");
IOUtils.copy(invoicePdf.getWebResponse().getContentAsStream(),
new FileOutputStream(path + "file name"));



catch (Exception e)
e.printStackTrace();






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Crossroads (UK TV series)

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế