Convert lateral join query to sqlalchemy
Convert lateral join query to sqlalchemy
I am having difficulty translating a query I created in sql (postgres) into sqlalchemy. In particular, my attempted mapping in sqlalchemy is leading to absurd recursive results that will run far slower than what I originally wrote.
Given the following type of table structure:
metadata
------------------------------
primary_id - integer
secondary_count - integer
property - string (many to each primary_id)
data
-----------------------------
primary_id - integer
secondary_id - integer (many to each primary_id)
primary_json - json bytes
secondary_json - json bytes
I am trying to retrieve pairs of primary and secondary data with the such that:
The first is easy to accomplish with a join between the two tables, however the second is more complicated. The solution I cam to (see here for the explanation that led me to this solution) in raw SQL is:
SELECT primary_id, primary_json, secondary_json, secondary_count
FROM
(
SELECT primary_id, secondary_count
FROM metadata
WHERE property='whatever I want'
-- Get the "best" 1000 results
ORDER BY secondary_count DESC
LIMIT 1000
) my_primary_ids
LEFT OUTER JOIN LATERAL
(
SELECT primary_json, seondary_json
FROM data
WHERE primary_id = my_primary_ids.primary_id
-- Only return 10 pieces of secondary json per primary json
LIMIT 10
) json_content ON true;
I have tried my utmost to convert this to sqlalchemy, however I keep having the problem that the resulting query re-writes the subquery in the FROM
clause of the lateral join query.
FROM
For example, the sqlalchemy code (assuming table object definitions which match the above) below is a partial solution. I think I can add the columns that are lacking (as you will see in the sql that is generated):
from sqlalchemy import true
my_prim_ids_al = (
query(Metadata.primary_id.label('primary_id'),
Metadata.secondary_count.label('secondary_count'))
.filter_by(property='whatever I want')
.order_by(Metadata.secondary_count)
.limit(1000)
.from_self()
.subquery('my_primary_ids')
)
json_content_al = (
query(Data.primary_json.label('primary_json'),
Data.secondary_json.label('secondary_json'))
.filter_by(primary_id=my_primary_ids_al.c.primary_id)
.limit(10)
.from_self()
.subquery('json_content')
.lateral()
)
joined_query = (
my_primary_ids_al
.outerjoin(json_content_al, true())
.subquery('joined_query')
)
The joined query, in long form, is the following, with the aforementioned ridiculous nested structure:
SELECT anon_1.primary_id, anon_1.secondary_count
FROM
(
SELECT metadata.primary_id AS primary_id,
metadata.secondary_count AS secondary_count
FROM metadata
WHERE metadata.property = 'whatever I want'
ORDER BY metadata.secondary_count DESC
LIMIT :param_1
) AS anon_1
LEFT OUTER JOIN LATERAL
(
SELECT anon_4.anon_3_secondary_json AS anon_3_secondary_json,
anon_4.anon_3_primary_json AS anon_3_primary_json,
FROM
(
SELECT anon_3.secondary_json AS anon_3_secondary_json,
anon_3.primary_json AS anon_3_primary_json,
FROM
(
SELECT data.secondary_json AS secondary_json,
data.primary_json AS primary_json,
FROM data
JOIN
(
SELECT anon_1.primary_id AS primary_id,
anon_1.secondary_count AS secondary_count
FROM
(
SELECT metadata.primary_id AS primary_id,
metadata.secondary_count AS secondary_count
FROM metadata
WHERE metadata.property = 'whatever I want'
ORDER BY metadata.secondary_count DESC
LIMIT :param_1
) AS anon_1
) AS primary_ids ON data.primary_id = primary_ides.primary_id
) AS anon_3
LIMIT :param_2) AS anon_4) AS anon_2 ON true
Again, I realize this is an incomplete attempt, in that not all the columns are SELECTed in the beginning, but the key issue is the sqlalchemy is creating an absurd amount of nested queries in the lateral join sub-query. This is the core issue that I have not been able to resolve, and unless it is resolved, there is little point in finishing the rest of the query.
1 Answer
1
You do not need both from_self()
and subquery()
, and the former is in this case messing with auto-correlation and causing the wildly recursive query, because the compiler treats the references to the 1st subquery in- and outside of the second as separate entities. Simply remove the calls to from_self()
and the query will be what you are after.
from_self()
subquery()
from_self()
What happens is that when calling from_self()
a new Query
that selects from the former Query
's SELECT statement is created. Applying subquery()
then creates a subquery from that, giving 2 levels of nesting. Of course that subquery must be used in yet another query, so there will be 3 levels of nesting, at least. And when the auto-correlation fails and the subquery is included in the second one as is, you get your deeply nested query.
from_self()
Query
Query
subquery()
from_self()
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
You're right! I had added the
from_self()
calls when trying to merge the queries as queries, not sub-queries, and then didn't remove them when I transitioned back to working with sub-queries. That solved my nesting problem. With the help of a coworker, I was also able to get a complete solution, including the selection of the correct columns from the joined query (and some other features that weren't relevant here). I will let him post that when he gets back from vacation though, as he should get the credit for his part. However he hadn't identified the cause of the nesting. Thanks!– P. Greene
Aug 31 at 14:08