Convert lateral join query to sqlalchemy

Convert lateral join query to sqlalchemy



I am having difficulty translating a query I created in sql (postgres) into sqlalchemy. In particular, my attempted mapping in sqlalchemy is leading to absurd recursive results that will run far slower than what I originally wrote.



Given the following type of table structure:


metadata
------------------------------
primary_id - integer
secondary_count - integer
property - string (many to each primary_id)

data
-----------------------------
primary_id - integer
secondary_id - integer (many to each primary_id)
primary_json - json bytes
secondary_json - json bytes



I am trying to retrieve pairs of primary and secondary data with the such that:



The first is easy to accomplish with a join between the two tables, however the second is more complicated. The solution I cam to (see here for the explanation that led me to this solution) in raw SQL is:


SELECT primary_id, primary_json, secondary_json, secondary_count
FROM
(
SELECT primary_id, secondary_count
FROM metadata
WHERE property='whatever I want'
-- Get the "best" 1000 results
ORDER BY secondary_count DESC
LIMIT 1000
) my_primary_ids
LEFT OUTER JOIN LATERAL
(
SELECT primary_json, seondary_json
FROM data
WHERE primary_id = my_primary_ids.primary_id
-- Only return 10 pieces of secondary json per primary json
LIMIT 10
) json_content ON true;



I have tried my utmost to convert this to sqlalchemy, however I keep having the problem that the resulting query re-writes the subquery in the FROM clause of the lateral join query.


FROM



For example, the sqlalchemy code (assuming table object definitions which match the above) below is a partial solution. I think I can add the columns that are lacking (as you will see in the sql that is generated):


from sqlalchemy import true

my_prim_ids_al = (
query(Metadata.primary_id.label('primary_id'),
Metadata.secondary_count.label('secondary_count'))
.filter_by(property='whatever I want')
.order_by(Metadata.secondary_count)
.limit(1000)
.from_self()
.subquery('my_primary_ids')
)
json_content_al = (
query(Data.primary_json.label('primary_json'),
Data.secondary_json.label('secondary_json'))
.filter_by(primary_id=my_primary_ids_al.c.primary_id)
.limit(10)
.from_self()
.subquery('json_content')
.lateral()
)
joined_query = (
my_primary_ids_al
.outerjoin(json_content_al, true())
.subquery('joined_query')
)



The joined query, in long form, is the following, with the aforementioned ridiculous nested structure:


SELECT anon_1.primary_id, anon_1.secondary_count
FROM
(
SELECT metadata.primary_id AS primary_id,
metadata.secondary_count AS secondary_count
FROM metadata
WHERE metadata.property = 'whatever I want'
ORDER BY metadata.secondary_count DESC
LIMIT :param_1
) AS anon_1
LEFT OUTER JOIN LATERAL
(
SELECT anon_4.anon_3_secondary_json AS anon_3_secondary_json,
anon_4.anon_3_primary_json AS anon_3_primary_json,
FROM
(
SELECT anon_3.secondary_json AS anon_3_secondary_json,
anon_3.primary_json AS anon_3_primary_json,
FROM
(
SELECT data.secondary_json AS secondary_json,
data.primary_json AS primary_json,
FROM data
JOIN
(
SELECT anon_1.primary_id AS primary_id,
anon_1.secondary_count AS secondary_count
FROM
(
SELECT metadata.primary_id AS primary_id,
metadata.secondary_count AS secondary_count
FROM metadata
WHERE metadata.property = 'whatever I want'
ORDER BY metadata.secondary_count DESC
LIMIT :param_1
) AS anon_1
) AS primary_ids ON data.primary_id = primary_ides.primary_id
) AS anon_3
LIMIT :param_2) AS anon_4) AS anon_2 ON true



Again, I realize this is an incomplete attempt, in that not all the columns are SELECTed in the beginning, but the key issue is the sqlalchemy is creating an absurd amount of nested queries in the lateral join sub-query. This is the core issue that I have not been able to resolve, and unless it is resolved, there is little point in finishing the rest of the query.




1 Answer
1



You do not need both from_self() and subquery(), and the former is in this case messing with auto-correlation and causing the wildly recursive query, because the compiler treats the references to the 1st subquery in- and outside of the second as separate entities. Simply remove the calls to from_self() and the query will be what you are after.


from_self()


subquery()


from_self()



What happens is that when calling from_self() a new Query that selects from the former Query's SELECT statement is created. Applying subquery() then creates a subquery from that, giving 2 levels of nesting. Of course that subquery must be used in yet another query, so there will be 3 levels of nesting, at least. And when the auto-correlation fails and the subquery is included in the second one as is, you get your deeply nested query.


from_self()


Query


Query


subquery()





You're right! I had added the from_self() calls when trying to merge the queries as queries, not sub-queries, and then didn't remove them when I transitioned back to working with sub-queries. That solved my nesting problem. With the help of a coworker, I was also able to get a complete solution, including the selection of the correct columns from the joined query (and some other features that weren't relevant here). I will let him post that when he gets back from vacation though, as he should get the credit for his part. However he hadn't identified the cause of the nesting. Thanks!
– P. Greene
Aug 31 at 14:08


from_self()



Required, but never shown



Required, but never shown






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)