SQLAlchemy: Scan huge tables using ORM?









up vote
35
down vote

favorite
19












I am currently playing around with SQLAlchemy a bit, which is really quite neat.



For testing I created a huge table containing my pictures archive, indexed by SHA1 hashes (to remove duplicates :-)). Which was impressingly fast...



For fun I did the equivalent of a select * over the resulting SQLite database:



session = Session()
for p in session.query(Picture):
print(p)


I expected to see hashes scrolling by, but instead it just kept scanning the disk. At the same time, memory usage was skyrocketing, reaching 1GB after a few seconds. This seems to come from the identity map feature of SQLAlchemy, which I thought was only keeping weak references.



Can somebody explain this to me? I thought that each Picture p would be collected after the hash is written out!?










share|improve this question























  • Noticed this same problem myself. If I do len(Session.query(Model).limit(100).all()), I get 1. If I remove the limit, memory use skyrockets.
    – Sarah Vessels
    Aug 3 '11 at 20:47














up vote
35
down vote

favorite
19












I am currently playing around with SQLAlchemy a bit, which is really quite neat.



For testing I created a huge table containing my pictures archive, indexed by SHA1 hashes (to remove duplicates :-)). Which was impressingly fast...



For fun I did the equivalent of a select * over the resulting SQLite database:



session = Session()
for p in session.query(Picture):
print(p)


I expected to see hashes scrolling by, but instead it just kept scanning the disk. At the same time, memory usage was skyrocketing, reaching 1GB after a few seconds. This seems to come from the identity map feature of SQLAlchemy, which I thought was only keeping weak references.



Can somebody explain this to me? I thought that each Picture p would be collected after the hash is written out!?










share|improve this question























  • Noticed this same problem myself. If I do len(Session.query(Model).limit(100).all()), I get 1. If I remove the limit, memory use skyrockets.
    – Sarah Vessels
    Aug 3 '11 at 20:47












up vote
35
down vote

favorite
19









up vote
35
down vote

favorite
19






19





I am currently playing around with SQLAlchemy a bit, which is really quite neat.



For testing I created a huge table containing my pictures archive, indexed by SHA1 hashes (to remove duplicates :-)). Which was impressingly fast...



For fun I did the equivalent of a select * over the resulting SQLite database:



session = Session()
for p in session.query(Picture):
print(p)


I expected to see hashes scrolling by, but instead it just kept scanning the disk. At the same time, memory usage was skyrocketing, reaching 1GB after a few seconds. This seems to come from the identity map feature of SQLAlchemy, which I thought was only keeping weak references.



Can somebody explain this to me? I thought that each Picture p would be collected after the hash is written out!?










share|improve this question















I am currently playing around with SQLAlchemy a bit, which is really quite neat.



For testing I created a huge table containing my pictures archive, indexed by SHA1 hashes (to remove duplicates :-)). Which was impressingly fast...



For fun I did the equivalent of a select * over the resulting SQLite database:



session = Session()
for p in session.query(Picture):
print(p)


I expected to see hashes scrolling by, but instead it just kept scanning the disk. At the same time, memory usage was skyrocketing, reaching 1GB after a few seconds. This seems to come from the identity map feature of SQLAlchemy, which I thought was only keeping weak references.



Can somebody explain this to me? I thought that each Picture p would be collected after the hash is written out!?







python performance orm sqlalchemy






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 2 '14 at 20:46









grebneke

3,585819




3,585819










asked Jul 17 '09 at 22:07









Bluehorn

1,86921521




1,86921521











  • Noticed this same problem myself. If I do len(Session.query(Model).limit(100).all()), I get 1. If I remove the limit, memory use skyrockets.
    – Sarah Vessels
    Aug 3 '11 at 20:47
















  • Noticed this same problem myself. If I do len(Session.query(Model).limit(100).all()), I get 1. If I remove the limit, memory use skyrockets.
    – Sarah Vessels
    Aug 3 '11 at 20:47















Noticed this same problem myself. If I do len(Session.query(Model).limit(100).all()), I get 1. If I remove the limit, memory use skyrockets.
– Sarah Vessels
Aug 3 '11 at 20:47




Noticed this same problem myself. If I do len(Session.query(Model).limit(100).all()), I get 1. If I remove the limit, memory use skyrockets.
– Sarah Vessels
Aug 3 '11 at 20:47












3 Answers
3






active

oldest

votes

















up vote
49
down vote













Okay, I just found a way to do this myself. Changing the code to



session = Session()
for p in session.query(Picture).yield_per(5):
print(p)


loads only 5 pictures at a time. It seems like the query will load all rows at a time by default. However, I don't yet understand the disclaimer on that method. Quote from SQLAlchemy docs




WARNING: use this method with caution; if the same instance is present in more than one batch of rows, end-user changes to attributes will be overwritten.
In particular, it’s usually impossible to use this setting with eagerly loaded collections (i.e. any lazy=False) since those collections will be cleared for a new load when encountered in a subsequent result batch.




So if using yield_per is actually the right way (tm) to scan over copious amounts of SQL data while using the ORM, when is it safe to use it?






share|improve this answer


















  • 5




    yield_per is safe when you have one result SQL row per resulting instance. You get extra rows per instance when you eagerload or join a one-to-many relation. If you need more details about the circumstances, you could create a separate question about yield_per.
    – Ants Aasma
    Jul 28 '09 at 14:48










  • Something to note: This throws an exception if you try to commit while doing this. See stackoverflow.com/questions/12233115/…
    – Theron Luhn
    Feb 2 '13 at 21:26






  • 4




    After a long battle with memory leaks in MySQL I saw this in the yield_per docs: Also note that while yield_per() will set the stream_results execution option to True, currently this is only understood by psycopg2 dialect which will stream results using server side cursors instead of pre-buffer all rows for this query. Other DBAPIs pre-buffer all rows before making them available. Workaround here: stackoverflow.com/a/3699677/281469
    – bcoughlan
    Dec 10 '13 at 15:13

















up vote
27
down vote













here's what I usually do for this situation:



def page_query(q):
offset = 0
while True:
r = False
for elem in q.limit(1000).offset(offset):
r = True
yield elem
offset += 1000
if not r:
break

for item in page_query(Session.query(Picture)):
print item


This avoids the various buffering that DBAPIs do as well (such as psycopg2 and MySQLdb). It still needs to be used appropriately if your query has explicit JOINs, although eagerly loaded collections are guaranteed to load fully since they are applied to a subquery which has the actual LIMIT/OFFSET supplied.



I have noticed that Postgresql takes almost as long to return the last 100 rows of a large result set as it does to return the entire result (minus the actual row-fetching overhead) since OFFSET just does a simple scan of the whole thing.






share|improve this answer






















  • I think you can get rid of one assignment in the loop, like: while True: elem = None; for elem .... offset += 1000; if elem==None: break; offset += 1000
    – Dave
    Jan 6 '11 at 17:02











  • No, Dave, you need to set elem = None at the end of the loop or it will never terminate.
    – George V. Reilly
    Jul 14 '12 at 2:18






  • 4




    Note that slicing the query will give you the same effect: q[offset:offset+1000].
    – Martijn Pieters
    Aug 24 '12 at 16:01










  • see also @zzzeek 's more recent answer on a similar subject ... linking to more info and code on wiki
    – hangtwenty
    Jun 28 '16 at 0:07











  • it helped me. thank you
    – krinker
    Sep 13 '16 at 19:29

















up vote
6
down vote













You can defer the picture to only retrieve on access. You can do it on a query by query basis.
like



session = Session()
for p in session.query(Picture).options(sqlalchemy.orm.defer("picture")):
print(p)


or you can do it in the mapper



mapper(Picture, pictures, properties=
'picture': deferred(pictures.c.picture)
)


How you do it is in the documentation here



Doing it either way will make sure that the picture is only loaded when you access the attribute.






share|improve this answer






















  • Thanks David, that is quite interesting. In fact I just changed my code to use from sqlalchemy.orm import deferred ... class Picture(object): ... imagedata = deferred(Column(Binary)) Which is a big improvement. However, it still takes a few seconds until the first result is output as a few 1000 Picture objects are created for the query. I would like to have those objects created one at a time while SQLAlchemy is iterating through the SQLite result rows.
    – Bluehorn
    Jul 18 '09 at 8:23










  • I think that most of the time will be spent retrieving the data from the database and not making the objects. So the query iself needs to have limits on. So. for p in session.query(Picture)[0:5]: print p.
    – David Raznick
    Jul 18 '09 at 18:38










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f1145905%2fsqlalchemy-scan-huge-tables-using-orm%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
49
down vote













Okay, I just found a way to do this myself. Changing the code to



session = Session()
for p in session.query(Picture).yield_per(5):
print(p)


loads only 5 pictures at a time. It seems like the query will load all rows at a time by default. However, I don't yet understand the disclaimer on that method. Quote from SQLAlchemy docs




WARNING: use this method with caution; if the same instance is present in more than one batch of rows, end-user changes to attributes will be overwritten.
In particular, it’s usually impossible to use this setting with eagerly loaded collections (i.e. any lazy=False) since those collections will be cleared for a new load when encountered in a subsequent result batch.




So if using yield_per is actually the right way (tm) to scan over copious amounts of SQL data while using the ORM, when is it safe to use it?






share|improve this answer


















  • 5




    yield_per is safe when you have one result SQL row per resulting instance. You get extra rows per instance when you eagerload or join a one-to-many relation. If you need more details about the circumstances, you could create a separate question about yield_per.
    – Ants Aasma
    Jul 28 '09 at 14:48










  • Something to note: This throws an exception if you try to commit while doing this. See stackoverflow.com/questions/12233115/…
    – Theron Luhn
    Feb 2 '13 at 21:26






  • 4




    After a long battle with memory leaks in MySQL I saw this in the yield_per docs: Also note that while yield_per() will set the stream_results execution option to True, currently this is only understood by psycopg2 dialect which will stream results using server side cursors instead of pre-buffer all rows for this query. Other DBAPIs pre-buffer all rows before making them available. Workaround here: stackoverflow.com/a/3699677/281469
    – bcoughlan
    Dec 10 '13 at 15:13














up vote
49
down vote













Okay, I just found a way to do this myself. Changing the code to



session = Session()
for p in session.query(Picture).yield_per(5):
print(p)


loads only 5 pictures at a time. It seems like the query will load all rows at a time by default. However, I don't yet understand the disclaimer on that method. Quote from SQLAlchemy docs




WARNING: use this method with caution; if the same instance is present in more than one batch of rows, end-user changes to attributes will be overwritten.
In particular, it’s usually impossible to use this setting with eagerly loaded collections (i.e. any lazy=False) since those collections will be cleared for a new load when encountered in a subsequent result batch.




So if using yield_per is actually the right way (tm) to scan over copious amounts of SQL data while using the ORM, when is it safe to use it?






share|improve this answer


















  • 5




    yield_per is safe when you have one result SQL row per resulting instance. You get extra rows per instance when you eagerload or join a one-to-many relation. If you need more details about the circumstances, you could create a separate question about yield_per.
    – Ants Aasma
    Jul 28 '09 at 14:48










  • Something to note: This throws an exception if you try to commit while doing this. See stackoverflow.com/questions/12233115/…
    – Theron Luhn
    Feb 2 '13 at 21:26






  • 4




    After a long battle with memory leaks in MySQL I saw this in the yield_per docs: Also note that while yield_per() will set the stream_results execution option to True, currently this is only understood by psycopg2 dialect which will stream results using server side cursors instead of pre-buffer all rows for this query. Other DBAPIs pre-buffer all rows before making them available. Workaround here: stackoverflow.com/a/3699677/281469
    – bcoughlan
    Dec 10 '13 at 15:13












up vote
49
down vote










up vote
49
down vote









Okay, I just found a way to do this myself. Changing the code to



session = Session()
for p in session.query(Picture).yield_per(5):
print(p)


loads only 5 pictures at a time. It seems like the query will load all rows at a time by default. However, I don't yet understand the disclaimer on that method. Quote from SQLAlchemy docs




WARNING: use this method with caution; if the same instance is present in more than one batch of rows, end-user changes to attributes will be overwritten.
In particular, it’s usually impossible to use this setting with eagerly loaded collections (i.e. any lazy=False) since those collections will be cleared for a new load when encountered in a subsequent result batch.




So if using yield_per is actually the right way (tm) to scan over copious amounts of SQL data while using the ORM, when is it safe to use it?






share|improve this answer














Okay, I just found a way to do this myself. Changing the code to



session = Session()
for p in session.query(Picture).yield_per(5):
print(p)


loads only 5 pictures at a time. It seems like the query will load all rows at a time by default. However, I don't yet understand the disclaimer on that method. Quote from SQLAlchemy docs




WARNING: use this method with caution; if the same instance is present in more than one batch of rows, end-user changes to attributes will be overwritten.
In particular, it’s usually impossible to use this setting with eagerly loaded collections (i.e. any lazy=False) since those collections will be cleared for a new load when encountered in a subsequent result batch.




So if using yield_per is actually the right way (tm) to scan over copious amounts of SQL data while using the ORM, when is it safe to use it?







share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 12 '15 at 13:19









Zitrax

8,698126578




8,698126578










answered Jul 17 '09 at 22:23









Bluehorn

1,86921521




1,86921521







  • 5




    yield_per is safe when you have one result SQL row per resulting instance. You get extra rows per instance when you eagerload or join a one-to-many relation. If you need more details about the circumstances, you could create a separate question about yield_per.
    – Ants Aasma
    Jul 28 '09 at 14:48










  • Something to note: This throws an exception if you try to commit while doing this. See stackoverflow.com/questions/12233115/…
    – Theron Luhn
    Feb 2 '13 at 21:26






  • 4




    After a long battle with memory leaks in MySQL I saw this in the yield_per docs: Also note that while yield_per() will set the stream_results execution option to True, currently this is only understood by psycopg2 dialect which will stream results using server side cursors instead of pre-buffer all rows for this query. Other DBAPIs pre-buffer all rows before making them available. Workaround here: stackoverflow.com/a/3699677/281469
    – bcoughlan
    Dec 10 '13 at 15:13












  • 5




    yield_per is safe when you have one result SQL row per resulting instance. You get extra rows per instance when you eagerload or join a one-to-many relation. If you need more details about the circumstances, you could create a separate question about yield_per.
    – Ants Aasma
    Jul 28 '09 at 14:48










  • Something to note: This throws an exception if you try to commit while doing this. See stackoverflow.com/questions/12233115/…
    – Theron Luhn
    Feb 2 '13 at 21:26






  • 4




    After a long battle with memory leaks in MySQL I saw this in the yield_per docs: Also note that while yield_per() will set the stream_results execution option to True, currently this is only understood by psycopg2 dialect which will stream results using server side cursors instead of pre-buffer all rows for this query. Other DBAPIs pre-buffer all rows before making them available. Workaround here: stackoverflow.com/a/3699677/281469
    – bcoughlan
    Dec 10 '13 at 15:13







5




5




yield_per is safe when you have one result SQL row per resulting instance. You get extra rows per instance when you eagerload or join a one-to-many relation. If you need more details about the circumstances, you could create a separate question about yield_per.
– Ants Aasma
Jul 28 '09 at 14:48




yield_per is safe when you have one result SQL row per resulting instance. You get extra rows per instance when you eagerload or join a one-to-many relation. If you need more details about the circumstances, you could create a separate question about yield_per.
– Ants Aasma
Jul 28 '09 at 14:48












Something to note: This throws an exception if you try to commit while doing this. See stackoverflow.com/questions/12233115/…
– Theron Luhn
Feb 2 '13 at 21:26




Something to note: This throws an exception if you try to commit while doing this. See stackoverflow.com/questions/12233115/…
– Theron Luhn
Feb 2 '13 at 21:26




4




4




After a long battle with memory leaks in MySQL I saw this in the yield_per docs: Also note that while yield_per() will set the stream_results execution option to True, currently this is only understood by psycopg2 dialect which will stream results using server side cursors instead of pre-buffer all rows for this query. Other DBAPIs pre-buffer all rows before making them available. Workaround here: stackoverflow.com/a/3699677/281469
– bcoughlan
Dec 10 '13 at 15:13




After a long battle with memory leaks in MySQL I saw this in the yield_per docs: Also note that while yield_per() will set the stream_results execution option to True, currently this is only understood by psycopg2 dialect which will stream results using server side cursors instead of pre-buffer all rows for this query. Other DBAPIs pre-buffer all rows before making them available. Workaround here: stackoverflow.com/a/3699677/281469
– bcoughlan
Dec 10 '13 at 15:13












up vote
27
down vote













here's what I usually do for this situation:



def page_query(q):
offset = 0
while True:
r = False
for elem in q.limit(1000).offset(offset):
r = True
yield elem
offset += 1000
if not r:
break

for item in page_query(Session.query(Picture)):
print item


This avoids the various buffering that DBAPIs do as well (such as psycopg2 and MySQLdb). It still needs to be used appropriately if your query has explicit JOINs, although eagerly loaded collections are guaranteed to load fully since they are applied to a subquery which has the actual LIMIT/OFFSET supplied.



I have noticed that Postgresql takes almost as long to return the last 100 rows of a large result set as it does to return the entire result (minus the actual row-fetching overhead) since OFFSET just does a simple scan of the whole thing.






share|improve this answer






















  • I think you can get rid of one assignment in the loop, like: while True: elem = None; for elem .... offset += 1000; if elem==None: break; offset += 1000
    – Dave
    Jan 6 '11 at 17:02











  • No, Dave, you need to set elem = None at the end of the loop or it will never terminate.
    – George V. Reilly
    Jul 14 '12 at 2:18






  • 4




    Note that slicing the query will give you the same effect: q[offset:offset+1000].
    – Martijn Pieters
    Aug 24 '12 at 16:01










  • see also @zzzeek 's more recent answer on a similar subject ... linking to more info and code on wiki
    – hangtwenty
    Jun 28 '16 at 0:07











  • it helped me. thank you
    – krinker
    Sep 13 '16 at 19:29














up vote
27
down vote













here's what I usually do for this situation:



def page_query(q):
offset = 0
while True:
r = False
for elem in q.limit(1000).offset(offset):
r = True
yield elem
offset += 1000
if not r:
break

for item in page_query(Session.query(Picture)):
print item


This avoids the various buffering that DBAPIs do as well (such as psycopg2 and MySQLdb). It still needs to be used appropriately if your query has explicit JOINs, although eagerly loaded collections are guaranteed to load fully since they are applied to a subquery which has the actual LIMIT/OFFSET supplied.



I have noticed that Postgresql takes almost as long to return the last 100 rows of a large result set as it does to return the entire result (minus the actual row-fetching overhead) since OFFSET just does a simple scan of the whole thing.






share|improve this answer






















  • I think you can get rid of one assignment in the loop, like: while True: elem = None; for elem .... offset += 1000; if elem==None: break; offset += 1000
    – Dave
    Jan 6 '11 at 17:02











  • No, Dave, you need to set elem = None at the end of the loop or it will never terminate.
    – George V. Reilly
    Jul 14 '12 at 2:18






  • 4




    Note that slicing the query will give you the same effect: q[offset:offset+1000].
    – Martijn Pieters
    Aug 24 '12 at 16:01










  • see also @zzzeek 's more recent answer on a similar subject ... linking to more info and code on wiki
    – hangtwenty
    Jun 28 '16 at 0:07











  • it helped me. thank you
    – krinker
    Sep 13 '16 at 19:29












up vote
27
down vote










up vote
27
down vote









here's what I usually do for this situation:



def page_query(q):
offset = 0
while True:
r = False
for elem in q.limit(1000).offset(offset):
r = True
yield elem
offset += 1000
if not r:
break

for item in page_query(Session.query(Picture)):
print item


This avoids the various buffering that DBAPIs do as well (such as psycopg2 and MySQLdb). It still needs to be used appropriately if your query has explicit JOINs, although eagerly loaded collections are guaranteed to load fully since they are applied to a subquery which has the actual LIMIT/OFFSET supplied.



I have noticed that Postgresql takes almost as long to return the last 100 rows of a large result set as it does to return the entire result (minus the actual row-fetching overhead) since OFFSET just does a simple scan of the whole thing.






share|improve this answer














here's what I usually do for this situation:



def page_query(q):
offset = 0
while True:
r = False
for elem in q.limit(1000).offset(offset):
r = True
yield elem
offset += 1000
if not r:
break

for item in page_query(Session.query(Picture)):
print item


This avoids the various buffering that DBAPIs do as well (such as psycopg2 and MySQLdb). It still needs to be used appropriately if your query has explicit JOINs, although eagerly loaded collections are guaranteed to load fully since they are applied to a subquery which has the actual LIMIT/OFFSET supplied.



I have noticed that Postgresql takes almost as long to return the last 100 rows of a large result set as it does to return the entire result (minus the actual row-fetching overhead) since OFFSET just does a simple scan of the whole thing.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jul 14 '12 at 14:22

























answered Aug 2 '09 at 1:28









zzzeek

46.9k13133147




46.9k13133147











  • I think you can get rid of one assignment in the loop, like: while True: elem = None; for elem .... offset += 1000; if elem==None: break; offset += 1000
    – Dave
    Jan 6 '11 at 17:02











  • No, Dave, you need to set elem = None at the end of the loop or it will never terminate.
    – George V. Reilly
    Jul 14 '12 at 2:18






  • 4




    Note that slicing the query will give you the same effect: q[offset:offset+1000].
    – Martijn Pieters
    Aug 24 '12 at 16:01










  • see also @zzzeek 's more recent answer on a similar subject ... linking to more info and code on wiki
    – hangtwenty
    Jun 28 '16 at 0:07











  • it helped me. thank you
    – krinker
    Sep 13 '16 at 19:29
















  • I think you can get rid of one assignment in the loop, like: while True: elem = None; for elem .... offset += 1000; if elem==None: break; offset += 1000
    – Dave
    Jan 6 '11 at 17:02











  • No, Dave, you need to set elem = None at the end of the loop or it will never terminate.
    – George V. Reilly
    Jul 14 '12 at 2:18






  • 4




    Note that slicing the query will give you the same effect: q[offset:offset+1000].
    – Martijn Pieters
    Aug 24 '12 at 16:01










  • see also @zzzeek 's more recent answer on a similar subject ... linking to more info and code on wiki
    – hangtwenty
    Jun 28 '16 at 0:07











  • it helped me. thank you
    – krinker
    Sep 13 '16 at 19:29















I think you can get rid of one assignment in the loop, like: while True: elem = None; for elem .... offset += 1000; if elem==None: break; offset += 1000
– Dave
Jan 6 '11 at 17:02





I think you can get rid of one assignment in the loop, like: while True: elem = None; for elem .... offset += 1000; if elem==None: break; offset += 1000
– Dave
Jan 6 '11 at 17:02













No, Dave, you need to set elem = None at the end of the loop or it will never terminate.
– George V. Reilly
Jul 14 '12 at 2:18




No, Dave, you need to set elem = None at the end of the loop or it will never terminate.
– George V. Reilly
Jul 14 '12 at 2:18




4




4




Note that slicing the query will give you the same effect: q[offset:offset+1000].
– Martijn Pieters
Aug 24 '12 at 16:01




Note that slicing the query will give you the same effect: q[offset:offset+1000].
– Martijn Pieters
Aug 24 '12 at 16:01












see also @zzzeek 's more recent answer on a similar subject ... linking to more info and code on wiki
– hangtwenty
Jun 28 '16 at 0:07





see also @zzzeek 's more recent answer on a similar subject ... linking to more info and code on wiki
– hangtwenty
Jun 28 '16 at 0:07













it helped me. thank you
– krinker
Sep 13 '16 at 19:29




it helped me. thank you
– krinker
Sep 13 '16 at 19:29










up vote
6
down vote













You can defer the picture to only retrieve on access. You can do it on a query by query basis.
like



session = Session()
for p in session.query(Picture).options(sqlalchemy.orm.defer("picture")):
print(p)


or you can do it in the mapper



mapper(Picture, pictures, properties=
'picture': deferred(pictures.c.picture)
)


How you do it is in the documentation here



Doing it either way will make sure that the picture is only loaded when you access the attribute.






share|improve this answer






















  • Thanks David, that is quite interesting. In fact I just changed my code to use from sqlalchemy.orm import deferred ... class Picture(object): ... imagedata = deferred(Column(Binary)) Which is a big improvement. However, it still takes a few seconds until the first result is output as a few 1000 Picture objects are created for the query. I would like to have those objects created one at a time while SQLAlchemy is iterating through the SQLite result rows.
    – Bluehorn
    Jul 18 '09 at 8:23










  • I think that most of the time will be spent retrieving the data from the database and not making the objects. So the query iself needs to have limits on. So. for p in session.query(Picture)[0:5]: print p.
    – David Raznick
    Jul 18 '09 at 18:38














up vote
6
down vote













You can defer the picture to only retrieve on access. You can do it on a query by query basis.
like



session = Session()
for p in session.query(Picture).options(sqlalchemy.orm.defer("picture")):
print(p)


or you can do it in the mapper



mapper(Picture, pictures, properties=
'picture': deferred(pictures.c.picture)
)


How you do it is in the documentation here



Doing it either way will make sure that the picture is only loaded when you access the attribute.






share|improve this answer






















  • Thanks David, that is quite interesting. In fact I just changed my code to use from sqlalchemy.orm import deferred ... class Picture(object): ... imagedata = deferred(Column(Binary)) Which is a big improvement. However, it still takes a few seconds until the first result is output as a few 1000 Picture objects are created for the query. I would like to have those objects created one at a time while SQLAlchemy is iterating through the SQLite result rows.
    – Bluehorn
    Jul 18 '09 at 8:23










  • I think that most of the time will be spent retrieving the data from the database and not making the objects. So the query iself needs to have limits on. So. for p in session.query(Picture)[0:5]: print p.
    – David Raznick
    Jul 18 '09 at 18:38












up vote
6
down vote










up vote
6
down vote









You can defer the picture to only retrieve on access. You can do it on a query by query basis.
like



session = Session()
for p in session.query(Picture).options(sqlalchemy.orm.defer("picture")):
print(p)


or you can do it in the mapper



mapper(Picture, pictures, properties=
'picture': deferred(pictures.c.picture)
)


How you do it is in the documentation here



Doing it either way will make sure that the picture is only loaded when you access the attribute.






share|improve this answer














You can defer the picture to only retrieve on access. You can do it on a query by query basis.
like



session = Session()
for p in session.query(Picture).options(sqlalchemy.orm.defer("picture")):
print(p)


or you can do it in the mapper



mapper(Picture, pictures, properties=
'picture': deferred(pictures.c.picture)
)


How you do it is in the documentation here



Doing it either way will make sure that the picture is only loaded when you access the attribute.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jul 17 '09 at 23:23

























answered Jul 17 '09 at 23:06









David Raznick

11.8k12627




11.8k12627











  • Thanks David, that is quite interesting. In fact I just changed my code to use from sqlalchemy.orm import deferred ... class Picture(object): ... imagedata = deferred(Column(Binary)) Which is a big improvement. However, it still takes a few seconds until the first result is output as a few 1000 Picture objects are created for the query. I would like to have those objects created one at a time while SQLAlchemy is iterating through the SQLite result rows.
    – Bluehorn
    Jul 18 '09 at 8:23










  • I think that most of the time will be spent retrieving the data from the database and not making the objects. So the query iself needs to have limits on. So. for p in session.query(Picture)[0:5]: print p.
    – David Raznick
    Jul 18 '09 at 18:38
















  • Thanks David, that is quite interesting. In fact I just changed my code to use from sqlalchemy.orm import deferred ... class Picture(object): ... imagedata = deferred(Column(Binary)) Which is a big improvement. However, it still takes a few seconds until the first result is output as a few 1000 Picture objects are created for the query. I would like to have those objects created one at a time while SQLAlchemy is iterating through the SQLite result rows.
    – Bluehorn
    Jul 18 '09 at 8:23










  • I think that most of the time will be spent retrieving the data from the database and not making the objects. So the query iself needs to have limits on. So. for p in session.query(Picture)[0:5]: print p.
    – David Raznick
    Jul 18 '09 at 18:38















Thanks David, that is quite interesting. In fact I just changed my code to use from sqlalchemy.orm import deferred ... class Picture(object): ... imagedata = deferred(Column(Binary)) Which is a big improvement. However, it still takes a few seconds until the first result is output as a few 1000 Picture objects are created for the query. I would like to have those objects created one at a time while SQLAlchemy is iterating through the SQLite result rows.
– Bluehorn
Jul 18 '09 at 8:23




Thanks David, that is quite interesting. In fact I just changed my code to use from sqlalchemy.orm import deferred ... class Picture(object): ... imagedata = deferred(Column(Binary)) Which is a big improvement. However, it still takes a few seconds until the first result is output as a few 1000 Picture objects are created for the query. I would like to have those objects created one at a time while SQLAlchemy is iterating through the SQLite result rows.
– Bluehorn
Jul 18 '09 at 8:23












I think that most of the time will be spent retrieving the data from the database and not making the objects. So the query iself needs to have limits on. So. for p in session.query(Picture)[0:5]: print p.
– David Raznick
Jul 18 '09 at 18:38




I think that most of the time will be spent retrieving the data from the database and not making the objects. So the query iself needs to have limits on. So. for p in session.query(Picture)[0:5]: print p.
– David Raznick
Jul 18 '09 at 18:38

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f1145905%2fsqlalchemy-scan-huge-tables-using-orm%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)