How efficient is it to order by distance (entire table) in geodjango
How efficient is it to order by distance (entire table) in geodjango
Assume that I have the following data model
Person(models.Model):
id = models.BigAutoField(primary_key=True)
name = models.CharField(max_length=50)
location = models.PointField(srid=4326)
Assume also that I have an app that makes queries to this django backend, and the only purpose of this app is to return a (paginated) list of registered users from closest to farthest.
Currently I have this query in mind:
# here we are obtaining all users in ordered form
current_location = me.location
people = Person.objects.distance(current_location).order_by('distance')
# here we are obtaining the first X through pagination
start_index = a
end_index = b
people = people[a:b]
Although this works, it is not as fast as I would like.
I have some concerns over the speed of this query. If the table were large (1 million+) then wouldn't the database (Postgres SQL w/ PostGIS) have to measure the distance between current_location
and every location
in the database before performing an order_by
on that subsequently 1 million rows?
current_location
location
order_by
Can somebody suggest on how to properly return nearby users ordered by distance in an efficient manner?
1 Answer
1
If you want to sort every entry on that table by distance then it will be slow as expected and there is nothing that can be done (that I am aware of at this point of time and my knowledge.)!
You can make your calculation more efficient by following this steps and making some assumptions:
Enable spatial indexing on your tables. To do that in GeoDjango, follow the doc instructions and fit them to your model:
Note
In PostGIS, ST_Distance_Sphere does not limit the geometry types geographic distance queries are performed with. [4] However, these queries may take a long time, as great-circle distances must be calculated on the fly for every row in the query. This is because the spatial index on traditional geometry fields cannot be used.
For much better performance on WGS84 distance queries, consider using geography columns in your database instead because they are able to use their spatial index in distance queries. You can tell GeoDjango to use a geography column by setting geography=True
in your field definition.
geography=True
Now you can narrow down your query with some logical constrains:
Ex: My user will not look for people more than 50km from his current position.
Narrow down the search using dwithin
spatial lookup which utilizes the above mentioned spatial indexing, therefore it is pretty fast.
dwithin
Finally apply the distance
order by on the remaining rows.
distance
The final query can look like this:
current_location = me.location
people = People.objects.filter(
location__dwithin=(current_location, D(km=50))
).annotate(
distance=Distance('location', current_location)
).order_by('distance')
P.S: Rather than creating a custom pagination attempt, it is more efficient to utilize the pagination methods provided for the django views:
Or you can use Django Rest Framework and use it's pagination:
OGRGeometry
'SRID=4326;POINT (-122 37)'
dwithin
django.contrib.gis.geos.Point
current_location
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I wanted to use
OGRGeometry
as it has a constructor that can take a location string like'SRID=4326;POINT (-122 37)'
but you can't use this as an arg todwithin
. Make sure to use adjango.contrib.gis.geos.Point
forcurrent_location
.– Harry Moreno
Aug 25 at 3:32