Scraping and downloading images without a File Extension










0















I'm trying to use Scrapy's Image/File pipeline to download images without any file extension.



For example, this image:



https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80



As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy. However, passing the url to image_urls or file_urls yield no downloaded images.



I've tried appending ".jpg" to the end of the url, it doesn't work.



How would I download these kind of images?



EDIT:



I have already enabled ImagePipeline. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.










share|improve this question
























  • Why do you think that this file has no extension? For me it appears as image/jpeg file

    – Andersson
    Nov 13 '18 at 14:50











  • @Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.

    – Amir Asyraf
    Nov 13 '18 at 16:36












  • But there is nothing wrong with image as well. I can easily download the file

    – Andersson
    Nov 14 '18 at 8:44















0















I'm trying to use Scrapy's Image/File pipeline to download images without any file extension.



For example, this image:



https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80



As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy. However, passing the url to image_urls or file_urls yield no downloaded images.



I've tried appending ".jpg" to the end of the url, it doesn't work.



How would I download these kind of images?



EDIT:



I have already enabled ImagePipeline. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.










share|improve this question
























  • Why do you think that this file has no extension? For me it appears as image/jpeg file

    – Andersson
    Nov 13 '18 at 14:50











  • @Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.

    – Amir Asyraf
    Nov 13 '18 at 16:36












  • But there is nothing wrong with image as well. I can easily download the file

    – Andersson
    Nov 14 '18 at 8:44













0












0








0








I'm trying to use Scrapy's Image/File pipeline to download images without any file extension.



For example, this image:



https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80



As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy. However, passing the url to image_urls or file_urls yield no downloaded images.



I've tried appending ".jpg" to the end of the url, it doesn't work.



How would I download these kind of images?



EDIT:



I have already enabled ImagePipeline. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.










share|improve this question
















I'm trying to use Scrapy's Image/File pipeline to download images without any file extension.



For example, this image:



https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80



As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy. However, passing the url to image_urls or file_urls yield no downloaded images.



I've tried appending ".jpg" to the end of the url, it doesn't work.



How would I download these kind of images?



EDIT:



I have already enabled ImagePipeline. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.







python image web-scraping scrapy






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 6:38







Amir Asyraf

















asked Nov 13 '18 at 14:36









Amir AsyrafAmir Asyraf

697




697












  • Why do you think that this file has no extension? For me it appears as image/jpeg file

    – Andersson
    Nov 13 '18 at 14:50











  • @Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.

    – Amir Asyraf
    Nov 13 '18 at 16:36












  • But there is nothing wrong with image as well. I can easily download the file

    – Andersson
    Nov 14 '18 at 8:44

















  • Why do you think that this file has no extension? For me it appears as image/jpeg file

    – Andersson
    Nov 13 '18 at 14:50











  • @Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.

    – Amir Asyraf
    Nov 13 '18 at 16:36












  • But there is nothing wrong with image as well. I can easily download the file

    – Andersson
    Nov 14 '18 at 8:44
















Why do you think that this file has no extension? For me it appears as image/jpeg file

– Andersson
Nov 13 '18 at 14:50





Why do you think that this file has no extension? For me it appears as image/jpeg file

– Andersson
Nov 13 '18 at 14:50













@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.

– Amir Asyraf
Nov 13 '18 at 16:36






@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.

– Amir Asyraf
Nov 13 '18 at 16:36














But there is nothing wrong with image as well. I can easily download the file

– Andersson
Nov 14 '18 at 8:44





But there is nothing wrong with image as well. I can easily download the file

– Andersson
Nov 14 '18 at 8:44












1 Answer
1






active

oldest

votes


















2














Have you enabled the ImagePipeline in your settings?



You should be able to see an INFO log that looks like this:



2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']


This code worked for me:



from scrapy.spiders import Spider

class MySpider(Spider):

name = "burpple-2.imgix.net"
start_urls = ['https://burpple-2.imgix.net/']

custom_settings =
'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
'IMAGES_STORE': '/some/valid/folder/',


def parse(self, response):
yield
'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],






share|improve this answer























  • Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

    – Amir Asyraf
    Nov 14 '18 at 6:36











  • Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

    – Guillaume
    Nov 14 '18 at 15:23











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283375%2fscraping-and-downloading-images-without-a-file-extension%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














Have you enabled the ImagePipeline in your settings?



You should be able to see an INFO log that looks like this:



2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']


This code worked for me:



from scrapy.spiders import Spider

class MySpider(Spider):

name = "burpple-2.imgix.net"
start_urls = ['https://burpple-2.imgix.net/']

custom_settings =
'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
'IMAGES_STORE': '/some/valid/folder/',


def parse(self, response):
yield
'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],






share|improve this answer























  • Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

    – Amir Asyraf
    Nov 14 '18 at 6:36











  • Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

    – Guillaume
    Nov 14 '18 at 15:23















2














Have you enabled the ImagePipeline in your settings?



You should be able to see an INFO log that looks like this:



2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']


This code worked for me:



from scrapy.spiders import Spider

class MySpider(Spider):

name = "burpple-2.imgix.net"
start_urls = ['https://burpple-2.imgix.net/']

custom_settings =
'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
'IMAGES_STORE': '/some/valid/folder/',


def parse(self, response):
yield
'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],






share|improve this answer























  • Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

    – Amir Asyraf
    Nov 14 '18 at 6:36











  • Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

    – Guillaume
    Nov 14 '18 at 15:23













2












2








2







Have you enabled the ImagePipeline in your settings?



You should be able to see an INFO log that looks like this:



2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']


This code worked for me:



from scrapy.spiders import Spider

class MySpider(Spider):

name = "burpple-2.imgix.net"
start_urls = ['https://burpple-2.imgix.net/']

custom_settings =
'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
'IMAGES_STORE': '/some/valid/folder/',


def parse(self, response):
yield
'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],






share|improve this answer













Have you enabled the ImagePipeline in your settings?



You should be able to see an INFO log that looks like this:



2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']


This code worked for me:



from scrapy.spiders import Spider

class MySpider(Spider):

name = "burpple-2.imgix.net"
start_urls = ['https://burpple-2.imgix.net/']

custom_settings =
'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
'IMAGES_STORE': '/some/valid/folder/',


def parse(self, response):
yield
'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 14 '18 at 2:40









GuillaumeGuillaume

1,1581724




1,1581724












  • Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

    – Amir Asyraf
    Nov 14 '18 at 6:36











  • Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

    – Guillaume
    Nov 14 '18 at 15:23

















  • Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

    – Amir Asyraf
    Nov 14 '18 at 6:36











  • Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

    – Guillaume
    Nov 14 '18 at 15:23
















Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

– Amir Asyraf
Nov 14 '18 at 6:36





Did you see the image actually downloaded to the folder? I've already enabled ImagePipeline, and other websites with proper image url can be downloaded just fine.

– Amir Asyraf
Nov 14 '18 at 6:36













Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

– Guillaume
Nov 14 '18 at 15:23





Yes, I can see the image downloaded locally in the folder, it created a subfolder called full and the image was in there.

– Guillaume
Nov 14 '18 at 15:23



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283375%2fscraping-and-downloading-images-without-a-file-extension%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌