Discussion:
Any clever ideas for an algorithm to efficiently perform this sql???
Steve Holdoway
2010-11-19 03:25:10 UTC
Permalink
I've got a bit of a problem that would best be solved by vector
graphics, but a) it's been 30 years since I've used such stuff, and b)
MySQL doesn't really support it. I have a point with (x,y) coordinates,
and am wanting to find the closest 10 points from it. These ( 8
million ) points are stored in a table.

What I've done is to store an Id, and the sum of the squares in this
table. I can then easily get the 10 closest. As a test,

select abs(XSquaredPlusYSquared-(<value of point X^2>+<value of point
Y^2>)) as Difference, ID From Pythagoras Order By 1 asc limit 10;

Unfortunately this ignores all indexes, etc and uses a plain old
filesort to perform this function. Any ideas on how this can be
improved?

The SQL that is, not the engine, hardware... (:

Cheers

Steve
--
Steve Holdoway BSc(Hons) MNZCS <***@greengecko.co.nz>
http://www.greengecko.co.nz
MSN: ***@greengecko.co.nz
Skype: sholdowa
Robin Paulson
2010-11-19 03:47:53 UTC
Permalink
Post by Steve Holdoway
I've got a bit of a problem that would best be solved by vector
graphics, but a) it's been 30 years since I've used such stuff, and b)
MySQL doesn't really support it. I have a point with (x,y) coordinates,
and am wanting to find the closest 10 points from it. These ( 8
million ) points are stored in a table.
What I've done is to store an Id, and the sum of the squares in this
table. I can then easily get the 10 closest. As a test,
select abs(XSquaredPlusYSquared-(<value of point X^2>+<value of point
Y^2>)) as Difference, ID From Pythagoras Order By 1 asc limit 10;
Unfortunately this ignores all indexes, etc and uses a plain old
filesort to perform this function. Any ideas on how this can be
improved?
my knowledge of sql is pretty small, but piecing together bits of what
i've heard around openstreetmap, and thinking (perhaps incorrectly)
this bears a similarity to manipulating geo data, could you do this
with the postgres gis extensions?

http://en.wikipedia.org/wiki/Postgis
http://postgis.refractions.net/

yeah, you'd have to switch to postgres - i'll leave that up to you to
decide if it's a good idea or not

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Adrian Mageanu
2010-11-19 04:58:25 UTC
Permalink
Post by Steve Holdoway
I've got a bit of a problem that would best be solved by vector
graphics, but a) it's been 30 years since I've used such stuff, and b)
MySQL doesn't really support it. I have a point with (x,y) coordinates,
and am wanting to find the closest 10 points from it. These ( 8
million ) points are stored in a table.
What I've done is to store an Id, and the sum of the squares in this
table. I can then easily get the 10 closest. As a test,
select abs(XSquaredPlusYSquared-(<value of point X^2>+<value of point
Y^2>)) as Difference, ID From Pythagoras Order By 1 asc limit 10;
Unfortunately this ignores all indexes, etc and uses a plain old
filesort to perform this function. Any ideas on how this can be
improved?
Cheers
Steve
_______________________________________________
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
This may be a tricky one, it is more of an art than a science and I
don't see a silver bullet.

Using EXPLAIN query, BENCHMARK(), and the query optimizer's output may
provide some clues.

I suspect that you'll have to help the engine's optimizer and tell it
more clearly what you want to do by simplifying (sic!) things and use a
JOIN between two aliases of the same table, use a WHERE clause to test
the selection's condition and use as index the very expression in the
WHERE clause. Use the OFFSET clause with LIMIT - with indexes does help.

Also use hints to force the use of the index(s) you find serves you
better if the optimizer doesn't want to do that.


Good luck,

Adrian




_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Jim Tittsler
2010-11-19 05:25:40 UTC
Permalink
On Fri, Nov 19, 2010 at 16:25, Steve Holdoway <***@greengecko.co.nz> wrote:
[...]
Post by Steve Holdoway
MySQL doesn't really support it. I have a point with (x,y) coordinates,
and am wanting to find the closest 10 points from it. These ( 8
million ) points are stored in a table.
[...]
Post by Steve Holdoway
Unfortunately this ignores all indexes, etc and uses a plain old
filesort to perform this function. Any ideas on how this can be
improved?
Do you have any domain knowledge about the distribution? My first
scheme would be to pick a bounding square around x,y (taking advantage
of indexes on each) that was guaranteed to have the 10 nearest (but
hopefully well under 8 million) and then do your expensive calculation
on that selection to find out which are closest.

Failing that, you could iterate with increasing bounding square sizes
until the 10 candidates don't change... which might still be faster
than brute force comparisons.
Does that mean PostGIS is out? :-)

_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Robert Coup
2010-11-21 22:31:33 UTC
Permalink
Hi Steve,
Post by Steve Holdoway
I've got a bit of a problem that would best be solved by vector
graphics, but a) it's been 30 years since I've used such stuff, and b)
MySQL doesn't really support it. I have a point with (x,y) coordinates,
and am wanting to find the closest 10 points from it. These ( 8
million ) points are stored in a table.
+1 on postgis. It is easy & fast & comprehensive & works.

Having said that, if you're stuck with MySQL you can use their basic
geometry extension:
http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html

If you can say "the closest 10 points within a maximum of 10,000 units away"
then:
- select * from mytable where MBRIntersects(buffer(point(0,0), 10000),
xy_field) order by distance(point(0,0), xy_field) limit 10;
- xy_field is a geometry field, and point(0,0) is the point you want to
test.
- tweak the 10k to a rough maximum distance between a point and it's 10
neighbours - that depends on what your units of distance are and the density
of your data. It's used to filter the data before it gets measured and
sorted so only 800 or 8000 points get manipulated rather than 8 million.

Rob :)
_______________________________________________
NZLUG mailing list ***@linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Steve Holdoway
2010-11-21 23:38:53 UTC
Permalink
Thanks for the ideas. I think my original idea is rather... errr flawed
to say the least, as it will match those the closest distance from
origin, regardless of direction.

I'm looking into the mysql spatial extensions to see if they'll help.

Cheers,

Steve
Post by Robert Coup
Hi Steve,
Post by Steve Holdoway
I've got a bit of a problem that would best be solved by vector
graphics, but a) it's been 30 years since I've used such stuff, and b)
MySQL doesn't really support it. I have a point with (x,y) coordinates,
and am wanting to find the closest 10 points from it. These ( 8
million ) points are stored in a table.
+1 on postgis. It is easy & fast & comprehensive & works.
Having said that, if you're stuck with MySQL you can use their basic
http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html
If you can say "the closest 10 points within a maximum of 10,000 units away"
- select * from mytable where MBRIntersects(buffer(point(0,0), 10000),
xy_field) order by distance(point(0,0), xy_field) limit 10;
- xy_field is a geometry field, and point(0,0) is the point you want to
test.
- tweak the 10k to a rough maximum distance between a point and it's 10
neighbours - that depends on what your units of distance are and the density
of your data. It's used to filter the data before it gets measured and
sorted so only 800 or 8000 points get manipulated rather than 8 million.
Rob :)
_______________________________________________
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
--
Steve Holdoway BSc(Hons) MNZCS <***@greengecko.co.nz>
http://www.greengecko.co.nz
MSN: ***@greengecko.co.nz
Skype: sholdowa
Loading...