Scraping and downloading images without a File Extension
I'm trying to use Scrapy's Image/File pipeline
to download images without any file extension.
For example, this image:
https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80
As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy
. However, passing the url to image_urls
or file_urls
yield no downloaded images.
I've tried appending ".jpg" to the end of the url, it doesn't work.
How would I download these kind of images?
EDIT:
I have already enabled ImagePipeline
. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.
python image web-scraping scrapy
add a comment |
I'm trying to use Scrapy's Image/File pipeline
to download images without any file extension.
For example, this image:
https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80
As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy
. However, passing the url to image_urls
or file_urls
yield no downloaded images.
I've tried appending ".jpg" to the end of the url, it doesn't work.
How would I download these kind of images?
EDIT:
I have already enabled ImagePipeline
. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.
python image web-scraping scrapy
Why do you think that this file has no extension? For me it appears asimage/jpeg
file
– Andersson
Nov 13 '18 at 14:50
@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.
– Amir Asyraf
Nov 13 '18 at 16:36
But there is nothing wrong with image as well. I can easily download the file
– Andersson
Nov 14 '18 at 8:44
add a comment |
I'm trying to use Scrapy's Image/File pipeline
to download images without any file extension.
For example, this image:
https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80
As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy
. However, passing the url to image_urls
or file_urls
yield no downloaded images.
I've tried appending ".jpg" to the end of the url, it doesn't work.
How would I download these kind of images?
EDIT:
I have already enabled ImagePipeline
. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.
python image web-scraping scrapy
I'm trying to use Scrapy's Image/File pipeline
to download images without any file extension.
For example, this image:
https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80
As you can see, the image loads just fine, and I'm able to scrape the url in Scrapy
. However, passing the url to image_urls
or file_urls
yield no downloaded images.
I've tried appending ".jpg" to the end of the url, it doesn't work.
How would I download these kind of images?
EDIT:
I have already enabled ImagePipeline
. Downloading from other URLs with proper file extension to them works fine, and I can see the images are downloaded to the designated folders.
python image web-scraping scrapy
python image web-scraping scrapy
edited Nov 14 '18 at 6:38
Amir Asyraf
asked Nov 13 '18 at 14:36
Amir AsyrafAmir Asyraf
697
697
Why do you think that this file has no extension? For me it appears asimage/jpeg
file
– Andersson
Nov 13 '18 at 14:50
@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.
– Amir Asyraf
Nov 13 '18 at 16:36
But there is nothing wrong with image as well. I can easily download the file
– Andersson
Nov 14 '18 at 8:44
add a comment |
Why do you think that this file has no extension? For me it appears asimage/jpeg
file
– Andersson
Nov 13 '18 at 14:50
@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.
– Amir Asyraf
Nov 13 '18 at 16:36
But there is nothing wrong with image as well. I can easily download the file
– Andersson
Nov 14 '18 at 8:44
Why do you think that this file has no extension? For me it appears as
image/jpeg
file– Andersson
Nov 13 '18 at 14:50
Why do you think that this file has no extension? For me it appears as
image/jpeg
file– Andersson
Nov 13 '18 at 14:50
@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.
– Amir Asyraf
Nov 13 '18 at 16:36
@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.
– Amir Asyraf
Nov 13 '18 at 16:36
But there is nothing wrong with image as well. I can easily download the file
– Andersson
Nov 14 '18 at 8:44
But there is nothing wrong with image as well. I can easily download the file
– Andersson
Nov 14 '18 at 8:44
add a comment |
1 Answer
1
active
oldest
votes
Have you enabled the ImagePipeline in your settings?
You should be able to see an INFO log that looks like this:
2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']
This code worked for me:
from scrapy.spiders import Spider
class MySpider(Spider):
name = "burpple-2.imgix.net"
start_urls = ['https://burpple-2.imgix.net/']
custom_settings =
'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
'IMAGES_STORE': '/some/valid/folder/',
def parse(self, response):
yield
'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],
Did you see the image actually downloaded to the folder? I've already enabledImagePipeline
, and other websites with proper image url can be downloaded just fine.
– Amir Asyraf
Nov 14 '18 at 6:36
Yes, I can see the image downloaded locally in the folder, it created a subfolder calledfull
and the image was in there.
– Guillaume
Nov 14 '18 at 15:23
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283375%2fscraping-and-downloading-images-without-a-file-extension%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Have you enabled the ImagePipeline in your settings?
You should be able to see an INFO log that looks like this:
2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']
This code worked for me:
from scrapy.spiders import Spider
class MySpider(Spider):
name = "burpple-2.imgix.net"
start_urls = ['https://burpple-2.imgix.net/']
custom_settings =
'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
'IMAGES_STORE': '/some/valid/folder/',
def parse(self, response):
yield
'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],
Did you see the image actually downloaded to the folder? I've already enabledImagePipeline
, and other websites with proper image url can be downloaded just fine.
– Amir Asyraf
Nov 14 '18 at 6:36
Yes, I can see the image downloaded locally in the folder, it created a subfolder calledfull
and the image was in there.
– Guillaume
Nov 14 '18 at 15:23
add a comment |
Have you enabled the ImagePipeline in your settings?
You should be able to see an INFO log that looks like this:
2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']
This code worked for me:
from scrapy.spiders import Spider
class MySpider(Spider):
name = "burpple-2.imgix.net"
start_urls = ['https://burpple-2.imgix.net/']
custom_settings =
'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
'IMAGES_STORE': '/some/valid/folder/',
def parse(self, response):
yield
'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],
Did you see the image actually downloaded to the folder? I've already enabledImagePipeline
, and other websites with proper image url can be downloaded just fine.
– Amir Asyraf
Nov 14 '18 at 6:36
Yes, I can see the image downloaded locally in the folder, it created a subfolder calledfull
and the image was in there.
– Guillaume
Nov 14 '18 at 15:23
add a comment |
Have you enabled the ImagePipeline in your settings?
You should be able to see an INFO log that looks like this:
2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']
This code worked for me:
from scrapy.spiders import Spider
class MySpider(Spider):
name = "burpple-2.imgix.net"
start_urls = ['https://burpple-2.imgix.net/']
custom_settings =
'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
'IMAGES_STORE': '/some/valid/folder/',
def parse(self, response):
yield
'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],
Have you enabled the ImagePipeline in your settings?
You should be able to see an INFO log that looks like this:
2018-11-14 10:37:33 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline']
This code worked for me:
from scrapy.spiders import Spider
class MySpider(Spider):
name = "burpple-2.imgix.net"
start_urls = ['https://burpple-2.imgix.net/']
custom_settings =
'ITEM_PIPELINES': 'scrapy.pipelines.images.ImagesPipeline': 1,
'IMAGES_STORE': '/some/valid/folder/',
def parse(self, response):
yield
'image_urls': ['https://burpple-2.imgix.net/foods/3d9294008d0f76a92e21647960_original.?w=400&h=400&fit=crop&q=80'],
answered Nov 14 '18 at 2:40
GuillaumeGuillaume
1,1581724
1,1581724
Did you see the image actually downloaded to the folder? I've already enabledImagePipeline
, and other websites with proper image url can be downloaded just fine.
– Amir Asyraf
Nov 14 '18 at 6:36
Yes, I can see the image downloaded locally in the folder, it created a subfolder calledfull
and the image was in there.
– Guillaume
Nov 14 '18 at 15:23
add a comment |
Did you see the image actually downloaded to the folder? I've already enabledImagePipeline
, and other websites with proper image url can be downloaded just fine.
– Amir Asyraf
Nov 14 '18 at 6:36
Yes, I can see the image downloaded locally in the folder, it created a subfolder calledfull
and the image was in there.
– Guillaume
Nov 14 '18 at 15:23
Did you see the image actually downloaded to the folder? I've already enabled
ImagePipeline
, and other websites with proper image url can be downloaded just fine.– Amir Asyraf
Nov 14 '18 at 6:36
Did you see the image actually downloaded to the folder? I've already enabled
ImagePipeline
, and other websites with proper image url can be downloaded just fine.– Amir Asyraf
Nov 14 '18 at 6:36
Yes, I can see the image downloaded locally in the folder, it created a subfolder called
full
and the image was in there.– Guillaume
Nov 14 '18 at 15:23
Yes, I can see the image downloaded locally in the folder, it created a subfolder called
full
and the image was in there.– Guillaume
Nov 14 '18 at 15:23
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283375%2fscraping-and-downloading-images-without-a-file-extension%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Why do you think that this file has no extension? For me it appears as
image/jpeg
file– Andersson
Nov 13 '18 at 14:50
@Andersson Well yes it is jpeg. But somehow scrapy is unable to download it even as I append .jpg or .jpeg at the end of the url. Other website with proper image url works fine, so I don't think it's any issue with my configuration.
– Amir Asyraf
Nov 13 '18 at 16:36
But there is nothing wrong with image as well. I can easily download the file
– Andersson
Nov 14 '18 at 8:44