[postgis-users] Search Engine

Simon Greener simon at spatialdbadvisor.com
Thu Aug 5 01:01:56 PDT 2010


Uli,

> Clearly, if you want to search across different tables you need a way to
> combine data in a common place. You should not do anything like search
> the tables one after the other and combine the results.

There may be a software limitation in PostgreSQL that supports this as a  technical
decision but it is foolish to suggest that "you should not do anything like
search tables one after another". Hell, I did just this in SQL Server 2008 with
800 tables and the execution of the FULLTEXT SQL is very, very fast. That is < 10 seconds.
I filter the tables by MBR first against an enhanced Geometry_Columns table before
using FULLTEXT to do the search. But it may mean that you mean selecting against
the tables using ordinary SQL with all searchable columns in the where clause as
predicates. Yes, this would be very slow and inflexible in terms of search capability.
That is why specialist search structures and functionalty have been created for most
databases.

> Even if you merge all tables into one, you will need some explicit
> mechanism (trigger!) to keep your tsvector up to date.

Correct. This is one of the reasons why this approach is unsustainable.

Also, what if someone didn't have the ability to modify the schema at will?
It might be that a lot of the use of PostGIS is as a shapefile replacement,
but real application databases do not afford that sort of happy go lucky approach
to data structuring.

> A system using triggers to build the tsvector is not so hard to
> maintain. Once you have written the triggers it simply runs and runs.
> Any time data in any relevant table changes, a trigger updates the
> tsvector, some key (gid or whatever) and maybe other data like bounding
> boxes in the one and only table that will be searched. Some challenge
> could arise, if it takes too long to rebuild your index on the tsvector.
> But normally this is not critical.
>
> We use the trigger approach for our search engine on
> http://mapmatters.org . The hardest thing there was and still is to
> optimize the way how data are combined and weighted for the tsvector (
> so how you feed the "to_tsvector" function).

I do not know much about tsvector but if you can create it external to the data
and populate it by triggers without otherwise changing the data structures of
the database then this would be a good solution.

In SQL Server 2008 FULLTEXT's indexing of tables as they change can be "declaratively"
organised. But if this is not possible with tsvector then the use of triggers may be the way to go.

regards
Simon
-- 
SpatialDB Advice and Design, Solutions Architecture and Programming,
Oracle Database 10g Administrator Certified Associate; Oracle Database 10g SQL Certified Professional
Oracle Spatial, SQL Server, PostGIS, MySQL, ArcSDE, Manifold GIS, FME, Radius Topology and Studio Specialist.
39 Cliff View Drive, Allens Rivulet, 7150, Tasmania, Australia.
Website: www.spatialdbadvisor.com
   Email: simon at spatialdbadvisor.com
   Voice: +61 362 396397
Mobile: +61 418 396391
Skype: sggreener
Longitude: 147.20515 (147° 12' 18" E)
Latitude: -43.01530 (43° 00' 55" S)
GeoHash: r22em9r98wg
NAC:W80CK 7SWP3



More information about the postgis-users mailing list