Scraping and downloading images without a File Extension

I'm trying to use Scrapy's Image/File pipeline to download images without any file extension.

For example, this image:

https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80

As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy. However, passing the url to image_urls or file_urls yield no downloaded images.

I've tried appending ".jpg" to the end of the url, it doesn't work.

How would I download these kind of images?

EDIT:

I have already enabled ImagePipeline. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.

edited Nov 14 '18 at 6:38

asked Nov 13 '18 at 14:36

Amir Asyraf

697

Why do you think that this file has no extension? For me it appears as image/jpeg file

– Andersson
Nov 13 '18 at 14:50

@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.

– Amir Asyraf
Nov 13 '18 at 16:36

But there is nothing wrong with image as well. I can easily download the file

– Andersson
Nov 14 '18 at 8:44

add a comment |

I'm trying to use Scrapy's Image/File pipeline to download images without any file extension.

For example, this image:

https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80

As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy. However, passing the url to image_urls or file_urls yield no downloaded images.

I've tried appending ".jpg" to the end of the url, it doesn't work.

How would I download these kind of images?

EDIT:

I have already enabled ImagePipeline. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.

edited Nov 14 '18 at 6:38

asked Nov 13 '18 at 14:36

Amir Asyraf

697

Why do you think that this file has no extension? For me it appears as image/jpeg file

– Andersson
Nov 13 '18 at 14:50

@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.

– Amir Asyraf
Nov 13 '18 at 16:36

But there is nothing wrong with image as well. I can easily download the file

– Andersson
Nov 14 '18 at 8:44

add a comment |

I'm trying to use Scrapy's Image/File pipeline to download images without any file extension.

For example, this image:

https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80

As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy. However, passing the url to image_urls or file_urls yield no downloaded images.

I've tried appending ".jpg" to the end of the url, it doesn't work.

How would I download these kind of images?

EDIT:

I have already enabled ImagePipeline. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.

edited Nov 14 '18 at 6:38

asked Nov 13 '18 at 14:36

Amir Asyraf

697

I'm trying to use Scrapy's Image/File pipeline to download images without any file extension.

For example, this image:

https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80

As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy. However, passing the url to image_urls or file_urls yield no downloaded images.

I've tried appending ".jpg" to the end of the url, it doesn't work.

How would I download these kind of images?

EDIT:

I have already enabled ImagePipeline. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.

python image web-scraping scrapy

edited Nov 14 '18 at 6:38

asked Nov 13 '18 at 14:36

Amir Asyraf

697

edited Nov 14 '18 at 6:38

asked Nov 13 '18 at 14:36

Amir Asyraf

697

edited Nov 14 '18 at 6:38

asked Nov 13 '18 at 14:36

Amir Asyraf

697

asked Nov 13 '18 at 14:36

Amir Asyraf

697

asked Nov 13 '18 at 14:36

Amir Asyraf

697

Why do you think that this file has no extension? For me it appears as image/jpeg file

– Andersson
Nov 13 '18 at 14:50

@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.

– Amir Asyraf
Nov 13 '18 at 16:36

But there is nothing wrong with image as well. I can easily download the file

– Andersson
Nov 14 '18 at 8:44

add a comment |

Why do you think that this file has no extension? For me it appears as image/jpeg file

– Andersson
Nov 13 '18 at 14:50

@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.

– Amir Asyraf
Nov 13 '18 at 16:36

But there is nothing wrong with image as well. I can easily download the file

– Andersson
Nov 14 '18 at 8:44

Why do you think that this file has no extension? For me it appears as image/jpeg file

– Andersson
Nov 13 '18 at 14:50

@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.

– Amir Asyraf
Nov 13 '18 at 16:36

But there is nothing wrong with image as well. I can easily download the file

– Andersson
Nov 14 '18 at 8:44

add a comment |

1 Answer
1

active

oldest

votes

Have you enabled the ImagePipeline in your settings?

You should be able to see an INFO log that looks like this:

2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']

This code worked for me:

from scrapy.spiders import Spider

class MySpider(Spider):

 name = "burpple-2.imgix.net"
 start_urls = ['https://burpple-2.imgix.net/']

 custom_settings = 
 'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
 'IMAGES_STORE': '/some/valid/folder/',
 

 def parse(self, response):
 yield 
 'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],

answered Nov 14 '18 at 2:40

Guillaume

1,1581724

Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

– Amir Asyraf
Nov 14 '18 at 6:36

Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

– Guillaume
Nov 14 '18 at 15:23

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283375%2fscraping-and-downloading-images-without-a-file-extension%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Have you enabled the ImagePipeline in your settings?

You should be able to see an INFO log that looks like this:

2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']

This code worked for me:

from scrapy.spiders import Spider

class MySpider(Spider):

 name = "burpple-2.imgix.net"
 start_urls = ['https://burpple-2.imgix.net/']

 custom_settings = 
 'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
 'IMAGES_STORE': '/some/valid/folder/',
 

 def parse(self, response):
 yield 
 'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],

answered Nov 14 '18 at 2:40

Guillaume

1,1581724

Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

– Amir Asyraf
Nov 14 '18 at 6:36

Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

– Guillaume
Nov 14 '18 at 15:23

add a comment |

Have you enabled the ImagePipeline in your settings?

You should be able to see an INFO log that looks like this:

2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']

This code worked for me:

from scrapy.spiders import Spider

class MySpider(Spider):

 name = "burpple-2.imgix.net"
 start_urls = ['https://burpple-2.imgix.net/']

 custom_settings = 
 'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
 'IMAGES_STORE': '/some/valid/folder/',
 

 def parse(self, response):
 yield 
 'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],

answered Nov 14 '18 at 2:40

Guillaume

1,1581724

Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

– Amir Asyraf
Nov 14 '18 at 6:36

Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

– Guillaume
Nov 14 '18 at 15:23

add a comment |

Have you enabled the ImagePipeline in your settings?

You should be able to see an INFO log that looks like this:

2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']

This code worked for me:

from scrapy.spiders import Spider

class MySpider(Spider):

 name = "burpple-2.imgix.net"
 start_urls = ['https://burpple-2.imgix.net/']

 custom_settings = 
 'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
 'IMAGES_STORE': '/some/valid/folder/',
 

 def parse(self, response):
 yield 
 'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],

answered Nov 14 '18 at 2:40

Guillaume

1,1581724

Have you enabled the ImagePipeline in your settings?

You should be able to see an INFO log that looks like this:

2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']

This code worked for me:

from scrapy.spiders import Spider

class MySpider(Spider):

 name = "burpple-2.imgix.net"
 start_urls = ['https://burpple-2.imgix.net/']

 custom_settings = 
 'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
 'IMAGES_STORE': '/some/valid/folder/',
 

 def parse(self, response):
 yield 
 'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],

answered Nov 14 '18 at 2:40

Guillaume

1,1581724

answered Nov 14 '18 at 2:40

Guillaume

1,1581724

answered Nov 14 '18 at 2:40

Guillaume

1,1581724

answered Nov 14 '18 at 2:40

Guillaume

1,1581724

Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

– Amir Asyraf
Nov 14 '18 at 6:36

Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

– Guillaume
Nov 14 '18 at 15:23

add a comment |

Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

– Amir Asyraf
Nov 14 '18 at 6:36

Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

– Guillaume
Nov 14 '18 at 15:23

Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

– Amir Asyraf
Nov 14 '18 at 6:36

Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

– Guillaume
Nov 14 '18 at 15:23

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dfyjkt