From strk at keybit.net Tue Feb 3 00:48:06 2004 From: strk at keybit.net (strk) Date: Tue, 3 Feb 2004 09:48:06 +0100 Subject: [postgis-devel] schema support patch 5 (shp2pgsql) Message-ID: <20040203094805.A74933@freek.keybit.net> I've patched shp2pgsql to explicitly support a schema. You can use . to be explicit or just
to rely on the pg's search_path at feed time or to use with pre-schema PG versions. MixeD Case Names are Supported (MySchema.MyTable) with both forget-identifiers-case (no option) and keep-identifiers-case (-k) logics. WARNINGS: - You cannot use a schema containing a dot in the name ( "My.Schema"."My.Table" is not supported ) - The schema you specify must already exist ( No create schema is issued ) Comments welcome, --strk; From strk at keybit.net Wed Feb 4 02:02:23 2004 From: strk at keybit.net (strk) Date: Wed, 4 Feb 2004 11:02:23 +0100 Subject: [postgis-devel] canonicalize_qual with PG-CVS Message-ID: <20040204110223.C84786@freek.keybit.net> postgis_estimate.c does not compile with the CVS version of postgres. I don't know much about it, can the author check it ? It's canonicalize_qual() signature that have changed.. --strk; From strk at keybit.net Mon Feb 9 04:03:10 2004 From: strk at keybit.net (strk) Date: Mon, 9 Feb 2004 13:03:10 +0100 Subject: [postgis-devel] Re: [postgis-users] pgsql2shp - problems In-Reply-To: <40276617.7030606@nlh.no>; from havard.tveite@nlh.no on Mon, Feb 09, 2004 at 11:51:03AM +0100 References: <4022152F.7040509@nlh.no> <20040206093342.B2461@freek.keybit.net> <4023D6BF.8070008@nlh.no> <20040206191726.A6412@freek.keybit.net> <40276617.7030606@nlh.no> Message-ID: <20040209130310.B29263@freek.keybit.net> havard.tveite wrote: > I have seen that BYTE_ORDER is defined wrong, but where is this value > set? On my system it's in /usr/include/endian.h Which is included dunno from where :) Maybe Paul can give us hints on how to handle this for Solaris --strk; From pramsey at refractions.net Mon Feb 9 07:47:29 2004 From: pramsey at refractions.net (Paul Ramsey) Date: Mon, 09 Feb 2004 07:47:29 -0800 Subject: [postgis-devel] Re: [postgis-users] pgsql2shp - problems In-Reply-To: <20040209130310.B29263@freek.keybit.net> Message-ID: <43FBD425-5B17-11D8-9438-000393D33C2E@refractions.net> It's not clear, unfortunately. The postgresql build environment defines it, and I assumed we were inheriting it from there, but I have not been able to track down where. The postgresql includes appears to be in the ports section, but are probably themselves included at another level. The sys/param stuff probably does not exist on his platform. I would continue researching for the right postgresql include file, and use that, getting rid of the sys/param.h includes. That way we can avoid OS switches (let the pgsql configure script continue to do the work) P On Monday, February 9, 2004, at 04:03 AM, strk wrote: > havard.tveite wrote: >> I have seen that BYTE_ORDER is defined wrong, but where is this value >> set? > > On my system it's in /usr/include/endian.h > Which is included dunno from where :) > > Maybe Paul can give us hints on how to handle this for Solaris > > --strk; > _______________________________________________ > postgis-devel mailing list > postgis-devel at postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-devel > Paul Ramsey Refractions Research Email: pramsey at refractions.net Phone: (250) 885-0632 From havard.tveite at nlh.no Mon Feb 9 09:15:06 2004 From: havard.tveite at nlh.no (Havard Tveite) Date: Mon, 09 Feb 2004 18:15:06 +0100 Subject: [postgis-devel] Re: [postgis-users] pgsql2shp - problems In-Reply-To: <20040209130310.B29263@freek.keybit.net> References: <4022152F.7040509@nlh.no> <20040206093342.B2461@freek.keybit.net> <4023D6BF.8070008@nlh.no> <20040206191726.A6412@freek.keybit.net> <40276617.7030606@nlh.no> <20040209130310.B29263@freek.keybit.net> Message-ID: <4027C01A.3050705@nlh.no> strk wrote: > havard.tveite wrote: > >>I have seen that BYTE_ORDER is defined wrong, but where is this value >>set? > > > On my system it's in /usr/include/endian.h This does not seem to be a PostGIS issue. I found that BYTE_ORDER is defined in src/include/port/solaris.h in PostgreSQL source directory. The postgresql configure finds out that it is a sparc solaris architecture, so I don't know what can be the problem. configure:1296: checking build system type configure:1314: result: sparc-sun-solaris2.6 configure:1322: checking host system type configure:1336: result: sparc-sun-solaris2.6 configure:1346: checking which template to use configure:1446: result: solaris I guess I will have to search elsewhere. Thank you for all your help. Now I know a bit more... > Which is included dunno from where :) > > Maybe Paul can give us hints on how to handle this for Solaris > > --strk; > _______________________________________________ > postgis-devel mailing list > postgis-devel at postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-devel > > -- Håvard Tveite Department of Mathematical Sciences and Technology Agricultural University of Norway Drøbakveien 14, POBox 5003, N-1432 Ås, NORWAY Phone: +47 64948857 Fax: +47 64948810 http://www.nlh.no/imt From dblasby at refractions.net Mon Feb 9 09:26:52 2004 From: dblasby at refractions.net (David Blasby) Date: Mon, 09 Feb 2004 09:26:52 -0800 Subject: [postgis-devel] Determining Endianess Message-ID: <4027C2DC.1030500@refractions.net> Here's a piece of code that directly tests the endian of the machine. Basically, it looks at how the number 1 is stored. In the 4 byte int, if its NDR, it will be in the 1st byte. If its XDR, it'll be in the last byte. int test = 1; if ( (((char *)(&test))[0]) == 1) { //NDR } else { //XDR } From strk at keybit.net Mon Feb 9 10:55:12 2004 From: strk at keybit.net (strk) Date: Mon, 9 Feb 2004 19:55:12 +0100 Subject: [postgis-devel] Determining Endianess In-Reply-To: <4027C2DC.1030500@refractions.net>; from dblasby@refractions.net on Mon, Feb 09, 2004 at 09:26:52AM -0800 References: <4027C2DC.1030500@refractions.net> Message-ID: <20040209195512.B32528@freek.keybit.net> I've included your code in pgsql2shp. It would be needed for postgis_inout.c and postgis_gist*.c where its computation time will be more noticeable. I've asked to the pgsql-hackers about using their computed BYTE_ORDER. Yet, I'd see better having Dave's code as an m4 macro and postgis switch to autoconf. dblasby wrote: > Here's a piece of code that directly tests the endian of the machine. > Basically, it looks at how the number 1 is stored. In the 4 byte int, > if its NDR, it will be in the 1st byte. If its XDR, it'll be in the > last byte. > > int test = 1; > > if ( (((char *)(&test))[0]) == 1) > { > //NDR > } > else > { > //XDR > } > > _______________________________________________ > postgis-devel mailing list > postgis-devel at postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-devel From pramsey at refractions.net Mon Feb 9 10:57:31 2004 From: pramsey at refractions.net (Paul Ramsey) Date: Mon, 09 Feb 2004 10:57:31 -0800 Subject: [postgis-devel] Determining Endianess In-Reply-To: <20040209195512.B32528@freek.keybit.net> References: <4027C2DC.1030500@refractions.net> <20040209195512.B32528@freek.keybit.net> Message-ID: <4027D81B.7090803@refractions.net> Better to just find out where postgresql computes it and take it from there. We should not be testing for endianness at runtime. strk wrote: > I've included your code in pgsql2shp. > It would be needed for postgis_inout.c > and postgis_gist*.c where its computation > time will be more noticeable. > > I've asked to the pgsql-hackers about using their computed > BYTE_ORDER. Yet, I'd see better having Dave's code as an m4 macro > and postgis switch to autoconf. > > dblasby wrote: > >>Here's a piece of code that directly tests the endian of the machine. >>Basically, it looks at how the number 1 is stored. In the 4 byte int, >>if its NDR, it will be in the 1st byte. If its XDR, it'll be in the >>last byte. >> >> int test = 1; >> >> if ( (((char *)(&test))[0]) == 1) >> { >> //NDR >> } >> else >> { >> //XDR >> } >> >>_______________________________________________ >>postgis-devel mailing list >>postgis-devel at postgis.refractions.net >>http://postgis.refractions.net/mailman/listinfo/postgis-devel > > _______________________________________________ > postgis-devel mailing list > postgis-devel at postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-devel -- __ / | Paul Ramsey | Refractions Research | Email: pramsey at refractions.net | Phone: (250) 885-0632 \_ From strk at keybit.net Mon Feb 9 12:34:17 2004 From: strk at keybit.net (strk) Date: Mon, 9 Feb 2004 21:34:17 +0100 Subject: [postgis-devel] Determining Endianess In-Reply-To: <4027D81B.7090803@refractions.net>; from pramsey@refractions.net on Mon, Feb 09, 2004 at 10:57:31AM -0800 References: <4027C2DC.1030500@refractions.net> <20040209195512.B32528@freek.keybit.net> <4027D81B.7090803@refractions.net> Message-ID: <20040209213417.A33309@freek.keybit.net> I'm sorry... this was simpler then I tought: postgres.h includes c.h c.h includes pg_config_os.h pg_config_os.h is a symlink to ..../port/.h (solaris.h for solaris os) This means that postgis_inout.c and postgis_geos.c SHOULD NOT have problems with byte ordering, while pgsql2shp can solve them including postgres.h (which is the only reason for including it). Including postgres.h will make pgsql2shp not buildable with no postgres server headers around ( this holds true for all postgis except loader and dumper ). Keeping runtime detection will slightly decrease run speed of dumper (only initialization is involved). What's better: including postgres.h or keeping runtime detection ? comments welcome. --strk; From strk at keybit.net Tue Feb 10 12:01:42 2004 From: strk at keybit.net (strk) Date: Tue, 10 Feb 2004 21:01:42 +0100 Subject: [postgis-devel] terr@terralogic.net on "Determining Endianess" Message-ID: <20040210210142.B41291@freek.keybit.net> ----- Forwarded message from terr at terralogic.net ----- From: terr at terralogic.net To: strk Subject: Re: [HACKERS] BYTE_ORDER for contribs Message-ID: <20040209151957.F28050 at saturn.terralogic.net> Byte order can be determined trivially at run time. IMHO it should not be done via defines. If someone disagrees please send me an example so I can see why there is a differning opinion on the matter. On Mon, Feb 09, 2004 at 07:51:22PM +0100, strk wrote: > Is there a quick way to use the BYTE_ORDER define > as set by pgsql ? I can't find an "entry point" > include for it. > > It's needed for postgis (problems with Solaris BYTE_ORDER). > > --strk; > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo at postgresql.org ----- End forwarded message ----- From strk at keybit.net Tue Feb 10 12:02:48 2004 From: strk at keybit.net (strk) Date: Tue, 10 Feb 2004 21:02:48 +0100 Subject: [postgis-devel] tgl@sss.pgh.pa.us on "Determining Endianess " Message-ID: <20040210210248.C41291@freek.keybit.net> ----- Forwarded message from Tom Lane ----- From: Tom Lane To: strk Cc: pgsql-hackers at postgresql.org Subject: Re: [HACKERS] BYTE_ORDER for contribs Date: Mon, 09 Feb 2004 20:21:04 -0500 Message-ID: <21012.1076376064 at sss.pgh.pa.us> strk writes: > Is there a quick way to use the BYTE_ORDER define > as set by pgsql ? I can't find an "entry point" > include for it. BYTE_ORDER isn't actually used anywhere in the backend, and hasn't been for a long time, so I wouldn't count on it to be right. > It's needed for postgis (problems with Solaris BYTE_ORDER). My recommendation is to rethink your code. There's usually a way to avoid having a compile-time dependency. regards, tom lane ----- End forwarded message ----- From dblasby at refractions.net Wed Feb 18 15:33:29 2004 From: dblasby at refractions.net (David Blasby) Date: Wed, 18 Feb 2004 15:33:29 -0800 Subject: [postgis-devel] Improving Performance Message-ID: <4033F649.8080906@refractions.net> I was recently doing some spatial joins between a table with 10,000 points to a table with 17,000,000 lines. Here's some of my thoughts on the matter. In the middle of this message, I give 4 ideas for improving speed. The last part of the message is a more in-depth analysis of one of these techniques. The query I was running looked something like: EXPLAIN ANALYSE SELECT ... FROM all_lake_points, csn_edges WHERE csn_edges.the_Geom && expand(all_lake_points.the_Geom,0.5) ; I repeatedly ran the process on one of our machines, and got the following times (it returns about 40,000 rows): Total runtime: 113813.259 ms (very low CPU utilization - <5%) Total runtime: 51843.412 ms Total runtime: 37013.140 ms Total runtime: 27280.564 ms Total runtime: 17084.181 ms Total runtime: 12869.891 ms Total runtime: 10867.587 ms Total runtime: 10803.591 ms (very high CPU utilization - >70%) On another machine: Total runtime: 163636.58 msec Total runtime: 65834.27 msec Total runtime: 9506.53 msec Total runtime: 7880.42 msec Starting and stopping the postmaster between queries doesnt affect these times, so the diminishing time is a result of the OS caching the Spatial Index and actual csn_edges tuples. The size of the spatial index is approximately O(n log n), where n= number of geometries being index. Since the index is composed of a BOX (xmin,ymin, xmax, ymax -- all doubles) of size 32, the actual index size is: 17,000,000 rows * 32bytes/row + 17,000,000 rows * GiST overhead + (index hiearchy overhead) = 520 mb + 340mb + hierarchy overhead = 860 mb + hierarchy overhead The GiST overhead is quite large - see ggeometry_compress() - its about 21 bytes. The index hierarchy account for the "internal" nodes in the GiST tree. I looked in the postgresql data directory, and it appears that the index is actually 1.2 Gb. The table csn_edges is about 14 Gb. The query actually works like this: get a row from all_lake_points make a bounding box Search csn_edges GiST index using the bounding box: GiST init Traverse GiST index For each matching GiST entry get tuple from csn_edges Experimentation (with explain analyse) leads me to believe that an uncached index search takes about 100 to 400 ms. A search with the index mostly cached is about 100ms, and with the index and actual tuples cached, about 1.5ms. This is also verified with the timings given above. The very low CPU utilization numbers indicate that the uncached version is IO bound, while the cached version is CPU bound. Running the same query serveral times hints to the OS to cache those disk pages. If a GiST entry is 32 (box) + 21 (GISTENTRY) =53 bytes, there are about 154 on an 8k disk page. Immediately, I see four ways to improve performance: 1. Have the OS allocate more space to the disk cache 2. Have the index be smaller 3. Have the actual tuples be smaller on disk 4. Order the table so that nearby geometries are very close together on the disk. #1 is up to the system administrator #3 we can change the size of a geometry by a small amount or go to a WKB-type datatype like I've talked about in a few discussions. This will not be a major difference unless your individual geometries only have a few points in them. #4 There's no easy way to do this. One way would be to make a Quad-tree type index for your geometries and insert them into the database on a bucket-by-bucket basis. (The PostGIS stats package does something like this in the histogram2d functions) This makes queries faster because it should minimize the number of *different* disk pages a query will access. It still reads the same number of tuples off the disk, but these tuples are more likely to be on the same page. For example, if your tuples are 1kb long and the disk page size is 8kb and you're selecting 1000 tuples. With random location, you'll likely read 1000 disk pages, but with a high degree of auto-correlation you could read as few as 1000/8 = 125 pages. This is definately worth persuing - but its a fair bit of work and would probably be an external program instead of an SQL command. Its hard to say how much this would improve performance. I'd like to talk more about #2 (smaller index). I'm proposing we change the index type (currently BOX - 32 bytes) to a: typedef struct BOX2DFLOAT_ { xmin float; xmax float; ymin float; ymax float; } BOX2DFLOAT; This saves 16 bytes, meaning we go from 154 GiST entries/page to 221 (43%). My original thoughts one this was it would be a 50% saving (16 vs 32), but the 21 byte GiST overhead reduces this. In the end my GiST index goes from about 1.2 Gb to 840 Mbs... We'll need to write a function that takes a BOX3D and converts it to a BOX2DFLOAT. The BOX2DFLOAT's xmin/ymin will have to be strictly less than the BOX3D's xmin/ymin to ensure that all the queries work properly. This will make all the search boxes "bigger" by about 1m if the coordinates are in the 10,000,000 ranges (a float4 cannot represent small differences in numbers as well as a float8). For boxes near the origin, we'll only see a few micrometers difference. This means the bounding box queries will always work correctly, but they might grab a few more nearby "canidate" geometries. Next, we'll have to write a set of BOX2DFLOAT high-level GiST support functions, and a set of low-level GiST support functions. These are all quite easy. At the end of this, the index query should be a bit faster, require about 40% less disk space, and require about 40% less cache space. In the end, the biggest wait will be pulling individual csn_edges tuples from the disk (see comments on #4, above). dave For DAVE's reference only: ------------------------- explain analyse SELECT all_lake_points.gid, all_lake_points.the_Geom ,csn_edges.code, csn_edges.edge_id,csn_edges.group_code ,startpoint(csn_edges.the_geom), endpoint(csn_edges.the_geom) WHERE all_lake_points.gid >= 650001 and all_lake_points.gid <= 660000 and all_lake_points.code != 1400 and csn_edges.the_Geom && expand(all_lake_points.the_Geom,0.5) and csn_edges.code in (1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1410, 1425, 1450, 1475, 2000, 2300, 6010) ; From strk at keybit.net Fri Feb 20 02:45:49 2004 From: strk at keybit.net (strk) Date: Fri, 20 Feb 2004 11:45:49 +0100 Subject: [postgis-devel] Re: ANALYZE patch applied to 7.5 CVS In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE502656A@openmanage>; from m.cave-ayland@webbased.co.uk on Thu, Feb 19, 2004 at 02:59:02PM -0000 References: <8F4A22E017460A458DB7BBAB65CA6AE517BB96@openmanage> <8F4A22E017460A458DB7BBAB65CA6AE502656A@openmanage> Message-ID: <20040220114549.A36633@freek.keybit.net> m.cave-ayland wrote: > Hi strk, > > > > Do you mean the ANALYZE function should return the number of > > rows that will be fed to the custom stats builder ? I guess > > for postgis this would be the full dataset, are there cases > > where this does not hold true ? > > Yes and yes. Due to the way that the statistics process works, there is > no way that we can pipe an entire dataset into a stats builder function. > This is because the rows requested are shared between all the analysis > function for all the different column types so each ANALYZE would then > iterate over the entire dataset multiple times..... this approach just > would not scale :( Stat builders could work taking a single value and updating a state (like aggregates do). That way you could make a single scan for all the columns. > The existing selectivity functions use sampling to approximate the > distribution of the data, and they work with few problems, so all we > have to do is to provide enough information to indicate whether our plan > is better than the inbuilt ones, i.e. at a minimum level of accuracy > required to do this. Remember the aim here is to provide an informed > *estimate* to the planner. Dave, do you think your histogram-based-estimator would work if the histogram has been built on a random subset of the geometries in the table ? I guess we can get to the whole set, but that would be rude :| > If you look at analyze.c the current functions use an estimate of 300 * > the statistics target of the column as the number of rows to sample. I > would imagine that this would be a fairly good starting point to begin > with, although we should aim to give a 'reasonable' estimate using the > in-built statistics default value of 10. How is the statistics target set ? --strk; From dblasby at refractions.net Fri Feb 20 12:22:36 2004 From: dblasby at refractions.net (David Blasby) Date: Fri, 20 Feb 2004 12:22:36 -0800 Subject: [postgis-devel] Light WeightLight Weight Geometry (LWGEOM) Proposal Message-ID: <40366C8C.5020308@refractions.net> As per our current discussions, I'm proposing a new Light-Weight Geometry. This will be 'as-well-as' the current PostGIS so no one will lose anything. I'm not going to back-port this, so we'll only support it in postgresql 7.4+. Disk Representation (serialized form) int32 size; //postgresql variable-length requirement char type; // this type (see below) Where the 8-byte 'type' is defined bit-wise as: xSBDtttt WHERE x = unused S = 4 byte SRID attached (0= not attached (-1), 1= attached) B = bounding box attached (0=no, 1=yes) (32 bytes) D = dimentionality (0=2d, 1=3d) tttt = actual type (as per the WKB type): enum wkbGeometryType { wkbPoint = 1, wkbLineString = 2, wkbPolygon = 3, wkbMultiPoint = 4, wkbMultiLineString = 5, wkbMultiPolygon = 6, wkbGeometryCollection = 7 }; In general, data will be exactly like the 3d-extended WKB representation (except there's no endian flag and the WKB type is defined as above). The bounding box flag is an optional component for large geometries - for small (<1000 point geometries) it will not be present. This allows for small storage of small geometries (where bounding boxes can be quickly calculated on-the-fly) but enhanced performance for large geometries. This will probably be a compile-time option. Examples (c.f. OGC SF SQL defintion of WKB (section 3.3.2.6)) ------------------------------------------------------------- A. 2D point w/o bounding box size = 21 bytes type: S=0,B=0,D=0, tttt= 1 X Y B. 3D point w/o bounding box size = 29 bytes type: S=0,B=0,D=1, tttt= 1 X Y Z C. 2D point WITH bounding box (you would never put on with a points, but this is just an example) size = 53 bytes type: S=0,B=1,D=0, tttt= 1 xmin ymin xmax ymax X Y D. 2D line String w/o bounding box size = npoints*16 + 9 type: S=0,B=0,D=0, tttt= 2 npoints X1 Y1 X2 Y2 ... E. 2D line String with bounding box size = npoints*16 + 9 + 32 type: S=0,B=1,D=0, tttt= 2 xmin ymin xmax ymax npoints X1 Y1 X2 Y2 ... F. 3D polygon w/o bounding box NOTE: I havent explicitly put in the ogcLinearRing size = type: S=0,B=1,D=0, tttt= 3 nrings npoints in ring1 X1 Y1 X2 Y2 ... npoints in ring3 X1 Y1 X2 Y2 ... ... G. 2d multilines string w/o bounding boxes NOTE: this is like the OGC spec - we duplicate type info in the sub-geometries. This is arguably not a good idea, but it does allow us to treat all the multi* and geometrycollection types equivelently. It also allows us to represent GeometryCollections of GeometryCollections (which postgis doesnt support). size = type: S=0,B=0,D=0, tttt= 5 nlines type: S=0,B=0,D=0, tttt= 2 npoints in line 1 X1 Y1 X2 Y2 .... type: S=0,B=0,D=0, tttt= 2 npoints in line 2 X1 Y1 X2 Y2 .... G. 2d multilines string with main bounding box size = type: S=0,B=0,D=0, tttt= 5 xmin ymin xmax ymax nlines type: S=0,B=0,D=0, tttt= 2 npoints in line 1 X1 Y1 X2 Y2 .... type: S=0,B=0,D=0, tttt= 2 npoints in line 2 X1 Y1 X2 Y2 .... G. 2d multilines string with main and sub bounding boxes NOTE: since our types are defined recursive manner, this type is possible. I dont think we should construct them in general. size = type: S=0,B=0,D=0, tttt= 5 xmin ymin xmax ymax nlines type: S=0,B=1,D=0, tttt= 2 xmin ymin xmax ymax npoints in line 1 X1 Y1 X2 Y2 .... type: S=0,B=1,D=0, tttt= 2 xmin ymin xmax ymax npoints in line 2 X1 Y1 X2 Y2 .... H. 2D point w/o bounding box (with SRID) size = 25 bytes type: S=1,B=0,D=0, tttt= 1 SRID X Y I. 3D point w/o bounding box (with SRID) size = 33 bytes type: S=1,B=0,D=1, tttt= 1 SRID X Y Z J. 2D point WITH bounding box (with SRID) (you would never put on with a points, but this is just an example) (note: SRID comes before bounding box) size = 57 bytes type: S=1,B=1,D=0, tttt= 1 SRID xmin ymin xmax ymax X Y Other notes: Cannonical form will be very much like the current WKB type (ie. looks like '000000FF001A..'). This means your pg_dumps will look strange, but you'll find it faster and there will not be any numberic drift as you move to and from WKT. We'll need to write a WKB to LWGEOM and LWGEOM to WKB. Since LWGEOM is very close to WKB, this should be simple. To get to WKT, we can do a LWGEOM->WKB->PostGIS GEOMETRY->WKB. For parsing WKT, we can WKT->PostGIS GEOMETRY->WKB->LWGEOM. The PostGIS conversion functions already exist, so this will be very easy. We'll also need to write something to convert a LWGEOM to a bounding box plus all the indexing support functions (based on BOX2DFLOAT4s). One of the design issues I have with PostGIS is that all the analysis functions deal directly with the serialized GEOMETRY form. This makes them more complex and difficult to maintain. I suggest we have soup-up versions of the current PostGIS geometry types (i.e. POLYGON3D, LINESTRING3D, POINT3D) for LWGEOM (i.e. LW_POLYGON, LW_LINE, LW_POINT) which would hide things like 2d vs 3d and make construction easier. What think? dave From strk at keybit.net Mon Feb 23 04:58:48 2004 From: strk at keybit.net (strk) Date: Mon, 23 Feb 2004 13:58:48 +0100 Subject: [postgis-devel] geometry stats Message-ID: <20040223135848.A61734@freek.keybit.net> Hi Dave, I've prepared and committed a skeleton for PG75 integrated stats support. What is done is: 1) histogram creation based on 300*attstatstarget sample rows (if available). 2) null fraction computation 3) average width of column values: SUM(samplegeom->size) / not_null samples What remains to do now is: 1) fill the histogram values (float4). 2) estimate the histogram. I've seen the current code is pretty fuzzy, and I've experienced this kind of algos to require many iterations on fine-tuning. I'd like to move the 'tunable' parts in the estimator, keeping the builder as strict as possible. Since we will use float instead of integers, we could use a number in the range 0-1 to express the factor of overlapping between a sample feature's box and an histogram cell. Currently 1 is added to the cell value if at least 5% of a feature overlaps it (correct me if I'm wrong). Finally we should 'normalize' the histogram dividing the value of each cell by the number of not-null (or total) samples handled. This should give a tune-free histogram, what do you think? Mark? Then the estimator will need a change too... but I'd like to discuss this later. --strk; From dblasby at refractions.net Mon Feb 23 10:03:33 2004 From: dblasby at refractions.net (David Blasby) Date: Mon, 23 Feb 2004 10:03:33 -0800 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <20040223135848.A61734@freek.keybit.net> References: <20040223135848.A61734@freek.keybit.net> Message-ID: <403A4075.4090804@refractions.net> > Finally we should 'normalize' the histogram dividing the value of each > cell by the number of not-null (or total) samples handled. This should > give a tune-free histogram, what do you think? Mark? I dont think you should normalize the histogram - just keep the 'normalization' value in the histogram. I think this will make it easier to keep the histogram updated. {load in the current histogram, or create a new one if there isnt currently one} 1. Is the bounding box the new set of features outside the extents of the histogram? YES - need to re-jig histogram. This is relatively difficult. 2. add 1 to each of the histogram's cells that a feature's box overlaps by >5%. 3. have the histogram keep track of the total number of new features in it. Every 'vacuum analyse' you do should make the histogram "smarter". dave From strk at keybit.net Mon Feb 23 13:27:48 2004 From: strk at keybit.net (strk) Date: Mon, 23 Feb 2004 22:27:48 +0100 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <403A4075.4090804@refractions.net>; from dblasby@refractions.net on Mon, Feb 23, 2004 at 10:03:33AM -0800 References: <20040223135848.A61734@freek.keybit.net> <403A4075.4090804@refractions.net> Message-ID: <20040223222748.A65666@freek.keybit.net> dblasby wrote: > Every 'vacuum analyse' you do should make the histogram "smarter". I don't think this is feasable. We won't be able to tell whether the existing histogram is still valid (rows deletion/update). To make the histogram as smart as it is now we can just require the whole dataset to be feed to the stats builder routine, but that is something people seems to be trying avoiding :) --strk; From m.cave-ayland at webbased.co.uk Tue Feb 24 10:14:01 2004 From: m.cave-ayland at webbased.co.uk (Mark Cave-Ayland) Date: Tue, 24 Feb 2004 18:14:01 -0000 Subject: [postgis-devel] RE: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE517BD20@openmanage> Message-ID: <8F4A22E017460A458DB7BBAB65CA6AE5026573@openmanage> Hi strk, > -----Original Message----- > From: strk [mailto:strk at keybit.net] > Sent: 23 February 2004 12:59 > To: David Blasby; Mark Cave-Ayland > Cc: postgis-devel at postgis.refractions.net > Subject: geometry stats > > > Hi Dave, > I've prepared and committed a skeleton for > PG75 integrated stats support. > > What is done is: > > 1) histogram creation based on 300*attstatstarget > sample rows (if available). > > 2) null fraction computation > > 3) average width of column values: > SUM(samplegeom->size) / not_null samples > > What remains to do now is: > > 1) fill the histogram values (float4). > > 2) estimate the histogram. Wow! I downloaded the hourly snapshot from CVS and had a look at the changes you had put in. This really is great stuff - I think I'm experiencing one of Dave's 'warm fuzzies' when someone works on a patch. The work you've done on this is greatly appreciated :) > I've seen the current code is pretty fuzzy, and I've > experienced this kind of algos to require many iterations on > fine-tuning. > > I'd like to move the 'tunable' parts in the estimator, > keeping the builder as strict as possible. Yup that sounds good. As you've probably realised, the only thing that can't be configured in the estimator is the number of boxes per side :) > Since we will use float instead of integers, we could use a > number in the range 0-1 to express the factor of overlapping > between a sample feature's box and an histogram cell. > Currently 1 is added to the cell value if at least 5% of a > feature overlaps it (correct me if I'm wrong). Finally we > should 'normalize' the histogram dividing the value of each > cell by the number of not-null (or total) samples handled. > This should give a tune-free histogram, what do you think? Mark? Sounds good to me - i.e. if all geometries sampled were to fit within a single box then the value of that histogram box would be 1.0. I don't think Dave's idea of updating the histogram will work, since as you rightly point out, you don't know which rows have been deleted between samples... > Then the estimator will need a change too... but I'd like to > discuss this later. Was there anything in particular you had in mind? The only case I can think of is that since we are working on a sample then we may have to calculate the value for some extents that lie outside the bounds of our histogram. However since the data is randomly sampled from the whole table then we can be fairly sure that this number will need to be a small fraction of the number of rows in the table - but we'll probably have to determine the best value by trial and error. I'll try and refresh my memory of the algorithm tomorrow to see if things will still work as they should working on a sample. Keep up the good work! Mark. --- Mark Cave-Ayland Webbased Ltd. Tamar Science Park Derriford Plymouth PL6 8BX England Tel: +44 (0)1752 764445 Fax: +44 (0)1752 764446 This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. From m.cave-ayland at webbased.co.uk Tue Feb 24 10:18:01 2004 From: m.cave-ayland at webbased.co.uk (Mark Cave-Ayland) Date: Tue, 24 Feb 2004 18:18:01 -0000 Subject: [postgis-devel] RE: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE517BD67@openmanage> Message-ID: <8F4A22E017460A458DB7BBAB65CA6AE512D14E@openmanage> > -----Original Message----- > From: strk [mailto:strk at keybit.net] > Sent: 23 February 2004 21:28 > To: David Blasby > Cc: Mark Cave-Ayland; postgis-devel at postgis.refractions.net > Subject: Re: geometry stats (cut) > To make the histogram as smart as it is now we can just > require the whole dataset to be feed to the stats builder > routine, but that is something people seems to be trying avoiding :) I know, I know, but there is a good technical argument for it! :) Mark. --- Mark Cave-Ayland Webbased Ltd. Tamar Science Park Derriford Plymouth PL6 8BX England Tel: +44 (0)1752 764445 Fax: +44 (0)1752 764446 This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. From strk at keybit.net Tue Feb 24 12:16:29 2004 From: strk at keybit.net (strk) Date: Tue, 24 Feb 2004 21:16:29 +0100 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE5026573@openmanage>; from m.cave-ayland@webbased.co.uk on Tue, Feb 24, 2004 at 06:14:01PM -0000 References: <8F4A22E017460A458DB7BBAB65CA6AE517BD20@openmanage> <8F4A22E017460A458DB7BBAB65CA6AE5026573@openmanage> Message-ID: <20040224211629.B74238@freek.keybit.net> m.cave-ayland wrote: > Hi strk, > > Yup that sounds good. As you've probably realised, the only thing that > can't be configured in the estimator is the number of boxes per side :) Actually I've tought we could set it instead... The geometry_analyze could set it based on geometry type or some other euristhic, and pass it to the compute_geometry_stat via stats->extra_data. But I'd discuss this when we have something working. > > Then the estimator will need a change too... but I'd like to > > discuss this later. > > Was there anything in particular you had in mind? The only case I can > think of is that since we are working on a sample then we may have to > calculate the value for some extents that lie outside the bounds of our > histogram. However since the data is randomly sampled from the whole > table then we can be fairly sure that this number will need to be a > small fraction of the number of rows in the table - but we'll probably > have to determine the best value by trial and error. I'm more concerned about knowing the actual extent of the whole dataset, so to be able to return 0.0 for constants just outside of it... anyway a quick curve from the borders to infinitely far could do the right thing. What I'm stuck now is writing the estimator. I've seen a lot of static funx in selfuncs.c, and I don't know wheter to copy them or just parse the args List myself (I'm not confident with pg data structure - could you help out ?) > Keep up the good work! Thanks, I need this :) --strk; From strk at keybit.net Tue Feb 24 16:46:33 2004 From: strk at keybit.net (strk) Date: Wed, 25 Feb 2004 01:46:33 +0100 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE5026573@openmanage>; from m.cave-ayland@webbased.co.uk on Tue, Feb 24, 2004 at 06:14:01PM -0000 References: <8F4A22E017460A458DB7BBAB65CA6AE517BD20@openmanage> <8F4A22E017460A458DB7BBAB65CA6AE5026573@openmanage> Message-ID: <20040225014633.A76675@freek.keybit.net> Ok. I'm at a good point (I hope). All the code is in those 3 monolithic functions: geometry_analyze() compute_geometry_stats() postgis_gist_sel() <-- warning, there are two of these.. The new postgis_gist_sel() seems now able to get to the stats, at least in the simple cases the_geom && 'box', 'box && the_geom. I please anyone to test more complex cases: NOTICEs should arise when problems are encountered. I have lots of questions opened: 1) should we consider the null fraction when making the estimation or is it something postgresql will handle itself ? 2) should we handle the const && const case evaluating it and returning either 1 when true or (what?) when false ? 3) is there a test suite to help tuning the estimator ? [ this is for Dave ] 4) why does the returned estimate change the result of a query ? [ box_contained() bug ? ] 5) how do we compose the bits we have: cell_area, box_area, AOI, cell_value, avgFeaturesArea to get a nice picture ;) ? 5bis) how do we see the nice picture ? (see point 3) The code has been committed. It's highly experimental, but should not conflict with builds other then against PG75 (CVS). Thanks for your attention. --strk; From dblasby at refractions.net Tue Feb 24 16:58:31 2004 From: dblasby at refractions.net (David Blasby) Date: Tue, 24 Feb 2004 16:58:31 -0800 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <20040225014633.A76675@freek.keybit.net> References: <8F4A22E017460A458DB7BBAB65CA6AE517BD20@openmanage> <8F4A22E017460A458DB7BBAB65CA6AE5026573@openmanage> <20040225014633.A76675@freek.keybit.net> Message-ID: <403BF337.5090701@refractions.net> > 3) is there a test suite to help tuning the estimator ? > [ this is for Dave ] Not really - I tested it by looking at the data in the histogram (to make sure the histogram was being calculated properly - see below), then looking at the actual stats (EXPLAIN ANALYSE) vs actual results. I have *no* idea how well it will work based on a sample size of 3000. You could have it do the old-style calculates as well as the new-style and have it spit out both results ( "elog(NOTICE,"...",..,..,..);"). There was some work done with random bounding boxes and checking the estimates, but I cannot find it. SELECT count(*) FROM
WHERE the_Geom && SELECT estimate_histogram(, ); > 5) how do we compose the bits we have: cell_area, box_area, AOI, > cell_value, avgFeaturesArea to get a nice picture ;) ? > > 5bis) how do we see the nice picture ? (see point 3) The original stats package had an explode_histogram() function that would create a table with the contents of the histogram. I used pgsql2shp and loaded the histogram and dataset into arc-view and checked the results. If you use arc-view (or jump) you can colour the histogram polygons based on the # of items in the box - this makes it easy to view. I used a roads dataset to test since its so highly skewed. Turning both layers on at the same time should show a dark (lots of geometries) box around the vicinity of the cities, and light-coloured in other areas. //explode_histogram2d(histogram2d, tablename::text) // executes CREATE TABLE tablename (the_geom geometry, id int, hits int, percent float) // then populates it // DOES NOT UPDATE GEOMETRY_COLUMNS The geometry is a polygon (representing the box). Make sure you drop the exploded histogram table before you run the function or you'll get an error. dave From chodgson at refractions.net Tue Feb 24 17:00:42 2004 From: chodgson at refractions.net (chodgson at refractions.net) Date: Tue, 24 Feb 2004 17:00:42 -0800 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <20040225014633.A76675@freek.keybit.net> References: <8F4A22E017460A458DB7BBAB65CA6AE517BD20@openmanage> <8F4A22E017460A458DB7BBAB65CA6AE5026573@openmanage> <20040225014633.A76675@freek.keybit.net> Message-ID: <1077670842.403bf3ba6c36e@hydra> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From strk at keybit.net Wed Feb 25 02:20:56 2004 From: strk at keybit.net (strk) Date: Wed, 25 Feb 2004 11:20:56 +0100 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <1077670842.403bf3ba6c36e@hydra>; from chodgson@refractions.net on Tue, Feb 24, 2004 at 05:00:42PM -0800 References: <8F4A22E017460A458DB7BBAB65CA6AE517BD20@openmanage> <8F4A22E017460A458DB7BBAB65CA6AE5026573@openmanage> <20040225014633.A76675@freek.keybit.net> <1077670842.403bf3ba6c36e@hydra> Message-ID: <20040225112056.A79990@freek.keybit.net> chodgson wrote: > > > 4) why does the returned estimate change the result of a query ? > > [ box_contained() bug ? ] > > If the index "contains" function is different than the && operator, (ie. at > least one of them is bugged/incorrect) then the results will be different if a > sequence scan is used than if an index scan is used... and the results of the > estimate affect the decision of whether or not to use the index. There seem to > be enough complaints about not getting the correct results that I think one of > the box_contain[s|ed] functions must be broken. Taking a look at the code it seems the the index overlap function is box_overlap taken from external source, while sequencial scan uses geometry_overlap(). It might be better to get the full control over geometry operations (they might have changed something inthe box containment funx). I won't be able to go on with estimation tuning as long as the bug is there, but my brain is absorbed by the selectivity estimation. Any help on this ? Mark ? --strk; From strk at keybit.net Wed Feb 25 04:10:39 2004 From: strk at keybit.net (strk) Date: Wed, 25 Feb 2004 13:10:39 +0100 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <20040225014633.A76675@freek.keybit.net>; from strk@keybit.net on Wed, Feb 25, 2004 at 01:46:33AM +0100 References: <8F4A22E017460A458DB7BBAB65CA6AE517BD20@openmanage> <8F4A22E017460A458DB7BBAB65CA6AE5026573@openmanage> <20040225014633.A76675@freek.keybit.net> Message-ID: <20040225131039.A81049@freek.keybit.net> I've added some comments in the code to explain the problem of estimating the histogram. I've also proposed a solution, which has been implemented. Briefly: the sum of cell values would be ok if either the search box or the table features overlap only a single histogram cell, but when having a bigger search_box AND average feature we need to take into account the fact that we are counting every single feature overlap more then once, thus we need to divide the estimated value by a factor. I experimented using the minimun number of histogram cells occupied between average feature and search box. Comments ? --strk; PS: I've also fixed a bug in the histogram creation process PPS: I'm having problems with memory allocation, inclusion or exclusion of elog()s makes a segfault appera/disappear :! From m.cave-ayland at webbased.co.uk Wed Feb 25 04:10:32 2004 From: m.cave-ayland at webbased.co.uk (Mark Cave-Ayland) Date: Wed, 25 Feb 2004 12:10:32 -0000 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE517BE0E@openmanage> Message-ID: <8F4A22E017460A458DB7BBAB65CA6AE5026576@openmanage> Hi strk, Fantastic! A 'working' implementation of the stats collector already :) > Taking a look at the code it seems the the index overlap > function is box_overlap taken from external source, while > sequencial scan uses geometry_overlap(). It might be better > to get the full control over geometry operations (they might > have changed something inthe box containment funx). Yeah I think this has been well discussed already (see the thread http://postgis.refractions.net/pipermail/postgis-users/2004-February/003 961.html) The way to test this would to be to try your same && query firstly using the index scan on a small number of tuples and verifying the results. Then peform the same query after typing SET ENABLE_SEQSCAN = 'f' and compare the results. If they are different then the implementation of box_overlap() is not the same as geometry_overlap() and this could be the problem you are seeing. I must admit that I would be quite surprised if the overlap() operator were broken this late in the game. Looking at the comments in postgis_estimate.c, do you really get different query results when uncommenting your 'selectivity = 0.9'? Is this true if you try different values, say from 0.1 to 0.9? I suppose the other possibility is that some memory is being overwritten where it shouldn't be. I've had another quick look at your new code and the only thing I've noticed at first sight is that PostgreSQL passes float4 values by reference(!) and not by value which may cause the get_attstatlot()/free_attrstatslot() not to handle memory as they should. Then again I would expect this to have been more likely to seg fault rather than upset the query.... Also, Tom is correct in that the selectivity returned only affects the choice of the planner and not the number of tuples returned. I'm currently working to tight deadlines (both in and out of work) which means I don't have as much time as I like to experiment with this. I hope that in a week's time I shall have time to look at this in more detail. Cheers, Mark. --- Mark Cave-Ayland Webbased Ltd. Tamar Science Park Derriford Plymouth PL6 8BX England Tel: +44 (0)1752 764445 Fax: +44 (0)1752 764446 This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. From strk at keybit.net Wed Feb 25 05:02:13 2004 From: strk at keybit.net (strk) Date: Wed, 25 Feb 2004 14:02:13 +0100 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE5026576@openmanage>; from m.cave-ayland@webbased.co.uk on Wed, Feb 25, 2004 at 12:10:32PM -0000 References: <8F4A22E017460A458DB7BBAB65CA6AE517BE0E@openmanage> <8F4A22E017460A458DB7BBAB65CA6AE5026576@openmanage> Message-ID: <20040225140213.A81378@freek.keybit.net> m.cave-ayland wrote: > Hi strk, > > Fantastic! A 'working' implementation of the stats collector already :) > > > Taking a look at the code it seems the the index overlap > > function is box_overlap taken from external source, while > > sequencial scan uses geometry_overlap(). It might be better > > to get the full control over geometry operations (they might > > have changed something inthe box containment funx). > > Yeah I think this has been well discussed already (see the thread > http://postgis.refractions.net/pipermail/postgis-users/2004-February/003 > 961.html) > > The way to test this would to be to try your same && query firstly using > the index scan on a small number of tuples and verifying the results. > Then peform the same query after typing SET ENABLE_SEQSCAN = 'f' and > compare the results. If they are different then the implementation of > box_overlap() is not the same as geometry_overlap() and this could be > the problem you are seeing. I must admit that I would be quite surprised > if the overlap() operator were broken this late in the game. > > Looking at the comments in postgis_estimate.c, do you really get > different query results when uncommenting your 'selectivity = 0.9'? Is > this true if you try different values, say from 0.1 to 0.9? Yes, box_overlap() is not the same as geometry_overlap(). There are no traces of box_overlap() in postgis code though, so the late broke might depends on external factors. > I suppose the other possibility is that some memory is being overwritten > where it shouldn't be. I've had another quick look at your new code and > the only thing I've noticed at first sight is that PostgreSQL passes > float4 values by reference(!) and not by value which may cause the > get_attstatlot()/free_attrstatslot() not to handle memory as they > should. Then again I would expect this to have been more likely to seg > fault rather than upset the query.... I actually get a random segfault, isn't this a get_attstatslot bug, anyway ? > I'm currently working to tight deadlines (both in and out of work) which > means I don't have as much time as I like to experiment with this. I > hope that in a week's time I shall have time to look at this in more > detail. I guess I'll be waiting for you on this :) --strk; From m.cave-ayland at webbased.co.uk Wed Feb 25 05:14:50 2004 From: m.cave-ayland at webbased.co.uk (Mark Cave-Ayland) Date: Wed, 25 Feb 2004 13:14:50 -0000 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE517BE16@openmanage> Message-ID: <8F4A22E017460A458DB7BBAB65CA6AE5026577@openmanage> Hi strk, > > Looking at the comments in postgis_estimate.c, do you really get > > different query results when uncommenting your 'selectivity > = 0.9'? Is > > this true if you try different values, say from 0.1 to 0.9? > > Yes, box_overlap() is not the same as geometry_overlap(). > There are no traces of box_overlap() in postgis code though, > so the late broke might depends on external factors. Yeah, box_overlap() is a routine from the internal PostgreSQL geometric functions (see src/backend/utils/adt/geo_ops.c). The current way it works is that a BOX3D is converted into a BOX and then box_overlap() is called. Yuck. The fix in the thread I mentioned before was that I wrote my own equivalent of the contains() operator and added it into PostGIS instead of calling the PostgreSQL contains/contained functions which just appeared to be plain broken. So can you clarify again: are you seeing different results depending upon whether a sequential scan/index scan is being used, or do you get different results with the same query plan but returning a different value for the selectivity? > > I suppose the other possibility is that some memory is being > > overwritten where it shouldn't be. I've had another quick > look at your > > new code and the only thing I've noticed at first sight is that > > PostgreSQL passes float4 values by reference(!) and not by > value which > > may cause the > > get_attstatlot()/free_attrstatslot() not to handle memory as they > > should. Then again I would expect this to have been more > likely to seg > > fault rather than upset the query.... Actually don't worry about this - it looks as if this is only the case for stavalues and not stanumbers - so we should be safe. > I actually get a random segfault, isn't this a > get_attstatslot bug, anyway ? This is in the estimation function, right? Can you narrow it down to a particular area at all, i.e. if you just have the function do nothing except call get_attstatslot(), free_attstatslot() and return the default selectivity, does it still crash? I think this would be a good test. > I guess I'll be waiting for you on this :) I'll try and help out as much as I can in the meantime. Cheers, Mark. --- Mark Cave-Ayland Webbased Ltd. Tamar Science Park Derriford Plymouth PL6 8BX England Tel: +44 (0)1752 764445 Fax: +44 (0)1752 764446 This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. From strk at keybit.net Wed Feb 25 05:29:43 2004 From: strk at keybit.net (strk) Date: Wed, 25 Feb 2004 14:29:43 +0100 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE5026577@openmanage>; from m.cave-ayland@webbased.co.uk on Wed, Feb 25, 2004 at 01:14:50PM -0000 References: <8F4A22E017460A458DB7BBAB65CA6AE517BE16@openmanage> <8F4A22E017460A458DB7BBAB65CA6AE5026577@openmanage> Message-ID: <20040225142943.B81378@freek.keybit.net> m.cave-ayland wrote: > Hi strk, > > Yeah, box_overlap() is a routine from the internal PostgreSQL geometric > functions (see src/backend/utils/adt/geo_ops.c). The current way it > works is that a BOX3D is converted into a BOX and then box_overlap() is > called. Yuck. The fix in the thread I mentioned before was that I wrote > my own equivalent of the contains() operator and added it into PostGIS > instead of calling the PostgreSQL contains/contained functions which > just appeared to be plain broken. http://homepages.zen.co.uk/zen9662/postgis/ Not Found. Anyway, I got your point and added a pgbox_overlap() function inside postgis_gist_72.c that makes the same computation as geometry_overlap(). Its seem to be working now. I think you were right calling for a completely 'internal' handling of the GiST, for now I just made a quick fix, but I'm sure we should review the code and make some cleanups there... > > I actually get a random segfault, isn't this a > > get_attstatslot bug, anyway ? > > This is in the estimation function, right? Can you narrow it down to a > particular area at all, i.e. if you just have the function do nothing > except call get_attstatslot(), free_attstatslot() and return the default > selectivity, does it still crash? I think this would be a good test. Nope, the segfault is there including or excluding elog() calls by setting DEBUG_GEOMETRY_STATS to 1 (no segfault) or 2 (segfault). It happens at the end of the analyze; command, right before the end of the function, so it's like postgresql is faced with wild pointers. If you want to take a look change this: @@ -1432,5 +1432,5 @@ -#if 0 & DEBUG_GEOMETRY_STATS > 1 +#if DEBUG_GEOMETRY_STATS > 1 and set DEBUG_GEOMETRY_STATS to 2 Allocation here happens for the GEOM_STATS struct itself in the stats->anl_context context. Other allocations are made in the 'current' context but are not meant to leave after the end of the function. --strk; From m.cave-ayland at webbased.co.uk Thu Feb 26 03:53:26 2004 From: m.cave-ayland at webbased.co.uk (Mark Cave-Ayland) Date: Thu, 26 Feb 2004 11:53:26 -0000 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE517BE19@openmanage> Message-ID: <8F4A22E017460A458DB7BBAB65CA6AE5026578@openmanage> Hi strk, > -----Original Message----- > From: strk [mailto:strk at keybit.net] > Sent: 25 February 2004 13:30 > To: Mark Cave-Ayland > Cc: 'PostGIS Development Discussion' > Subject: Re: [postgis-devel] Re: geometry stats > > > m.cave-ayland wrote: > > Hi strk, > > > > Yeah, box_overlap() is a routine from the internal PostgreSQL > > geometric functions (see src/backend/utils/adt/geo_ops.c). > > > > The current way it works is that a BOX3D is converted into a BOX and then > > box_overlap() is called. Yuck. The fix in the thread I mentioned > > before was that I wrote my own equivalent of the contains() operator > > and added it into PostGIS instead of calling the PostgreSQL > > contains/contained functions which just appeared to be plain broken. > > http://homepages.zen.co.uk/zen9662/postgis/ Not Found. > Anyway, I got your point and added a pgbox_overlap() function inside postgis_gist_72.c > that makes the same computation as geometry_overlap(). Its seem to be working now. I > think you were right calling for a completely 'internal' handling of the GiST, for now I > just made a quick fix, but I'm sure we should review the code and make some cleanups > there... Glad that you managed to understand what I did even though the URL is broken (looks like my ISP changed it to http://www.zen9662.zen.co.uk/postgis). I totally agree that we should use our own RTree functions in case the PostgreSQL ones are broken. From the discussions I remember that Dave wanted to keep the implementation the same as rtree_gist to help debugging; however it's looking like in this case we need to make an exception. If you found the overlap function was broken i.e. matching more geometries than it should have been, then this may give a speed-up to spatial && queries :) > Nope, the segfault is there including or excluding elog() calls by setting > DEBUG_GEOMETRY_STATS to 1 (no segfault) or 2 (segfault). It happens at the end of the > > analyze; command, right before the end of the function, so it's like postgresql is faced > with wild pointers. > > If you want to take a look change this: > @@ -1432,5 +1432,5 @@ > -#if 0 & DEBUG_GEOMETRY_STATS > 1 > +#if DEBUG_GEOMETRY_STATS > 1 > and set DEBUG_GEOMETRY_STATS to 2 > > Allocation here happens for the GEOM_STATS struct itself in the > stats->anl_context context. Other allocations are made in > the 'current' context but are not meant to leave after > the end of the function. OK you twisted my arm.... ;) I installed the latest CVS PostgreSQL and PostGIS and found out what the problem was - it's in the normalise code below: > /* > * Normalize histogram > * > * We divide each histogram cell value > * by the number of samples examined. > * > */ > for (i=0;i geomstats->value[i] /= samplerows; Because of the i++ (postincrement), i is now pointing to boxesPerSide*boxesPerSide (which is now outside the allocated memory space) > > #if DEBUG_GEOMETRY_STATS > 1 > { > int x, y; > for (x=0; x { > for (y=0; y { > geomstats->value[i] = 0; ^^^^^^^^^^^^^^^^^^^^^^^^ This causes your segfault because it is writing outside allocated palloc'd memory. Is this supposed to set the last value of the histogram to be 0? (remember this is outside the normalisation loop). The cure is either to remove this line, or change it to: geomstats->value[i - 1] = 0; > > > elog(NOTICE, " histo[%d,%d] = %.15f", x, y, geomstats->value[x+y*bps]); > } > } > } > #endif Hope this helps! Keep up the good work! Mark. --- Mark Cave-Ayland Webbased Ltd. Tamar Science Park Derriford Plymouth PL6 8BX England Tel: +44 (0)1752 764445 Fax: +44 (0)1752 764446 This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. From m.cave-ayland at webbased.co.uk Thu Feb 26 08:05:04 2004 From: m.cave-ayland at webbased.co.uk (Mark Cave-Ayland) Date: Thu, 26 Feb 2004 16:05:04 -0000 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE517BE60@openmanage> Message-ID: <8F4A22E017460A458DB7BBAB65CA6AE502657A@openmanage> Hi strk, I've just managed to get my first successful query working using PG7.5 CVS and your new ANALYZE code for PostGIS :) Along the way I found a bug in line 1187 of postgis_estimate.c: overlapping_cells = (x_idx_max-x_idx_min+1) * (y_idx_max-x_idx_min+1); ^^^ should actually read: overlapping_cells = (x_idx_max-x_idx_min+1) * (y_idx_max-y_idx_min+1); Using this (and removing the offending line from my previous post), I imported a dataset of UK settlements in our database, created the relevant indexes and ran an ANALYZE, followed by EXPLAIN ANALYZE: QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ --------- Index Scan using us_gm_27700_point_gm_geom_idx on us_gm_27700_point (cost=0.00..652.72 rows=192 width=319) (actual time=0.228..0.948 rows=105 loops=1) Index Cond: (gm_geom && 'SRID=-1;BOX3D(204029 39000 0,254000 68500 0)'::geometry) Total runtime: 1.529 ms (3 rows) Fantastic! The estimates on the spatial index don't look too bad at all, good enough I would say to decide which is the correct index to use :) Well done strk! Cheers, Mark. --- Mark Cave-Ayland Webbased Ltd. Tamar Science Park Derriford Plymouth PL6 8BX England Tel: +44 (0)1752 764445 Fax: +44 (0)1752 764446 This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. From strk at keybit.net Thu Feb 26 08:29:11 2004 From: strk at keybit.net (strk) Date: Thu, 26 Feb 2004 17:29:11 +0100 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE5026578@openmanage>; from m.cave-ayland@webbased.co.uk on Thu, Feb 26, 2004 at 11:53:26AM -0000 References: <8F4A22E017460A458DB7BBAB65CA6AE517BE19@openmanage> <8F4A22E017460A458DB7BBAB65CA6AE5026578@openmanage> Message-ID: <20040226172911.A91264@freek.keybit.net> m.cave-ayland wrote: > Hi strk, > > > /* > > #if DEBUG_GEOMETRY_STATS > 1 > > { > > int x, y; > > for (x=0; x > { > > for (y=0; y > { > > geomstats->value[i] = 0; > > ^^^^^^^^^^^^^^^^^^^^^^^^ > This causes your segfault because it is writing outside allocated > palloc'd memory. Is this supposed to set the last value of the histogram > to be 0? (remember this is outside the normalisation loop). The cure is > either to remove this line, or change it to: > > geomstats->value[i - 1] = 0; Actually that line is useless :) I removed it. > Hope this helps! Keep up the good work! Thank you. --strk; From strk at keybit.net Thu Feb 26 08:48:21 2004 From: strk at keybit.net (strk) Date: Thu, 26 Feb 2004 17:48:21 +0100 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE502657A@openmanage>; from m.cave-ayland@webbased.co.uk on Thu, Feb 26, 2004 at 04:05:04PM -0000 References: <8F4A22E017460A458DB7BBAB65CA6AE517BE60@openmanage> <8F4A22E017460A458DB7BBAB65CA6AE502657A@openmanage> Message-ID: <20040226174821.B91264@freek.keybit.net> m.cave-ayland wrote: > Hi strk, > > I've just managed to get my first successful query working using PG7.5 > CVS and your new ANALYZE code for PostGIS :) > > > Along the way I found a bug in line 1187 of postgis_estimate.c: > > overlapping_cells = (x_idx_max-x_idx_min+1) * > (y_idx_max-x_idx_min+1); > > ^^^ > > should actually read: > > overlapping_cells = (x_idx_max-x_idx_min+1) * > (y_idx_max-y_idx_min+1); > Fixed. In the histogram builder the code goes (1355): x_idx_min = (box->low.x-geomstats->xmin) / geow * bps; if (x_idx_min <0) x_idx_min = 0; if (x_idx_min >= bps) x_idx_min = bps-1; We are at the second rows scan here. The first scan computed geomstats->xmin, geomstats->xmax, geomstats->ymin, geomstats->ymax looking at each sample bounding box. Doesn't this mean there is no need to check for boundary overlflow ?? I'd remove them. In the histogram evaluator (1094): x_idx_max = (box->high.x-geomstats->xmin) / geow * bps; if (x_idx_max <0) { // should increment the value somehow x_idx_max = 0; } if (x_idx_max >= bps ) { // should increment the value somehow x_idx_max = bps-1; } Do you think it's worth actually incrementing the value ? What kind of curve should we apply in that case ? What we are detecting is that the query box goes outside the histogram extent, we should find the values of approximate cells ? The worst thing that can happen is that the estimator gives a smaller estimation, which eventually could be 0.0. Would that estimation just force the planner to make use of the index or would it make it think there are no rows in the result ? In the histogram evaluator: I've re-introduced histogram cell value be divided by the fraction of cell that really overlaps query box. Keep up with the good support ! :) --strk; From strk at keybit.net Thu Feb 26 08:52:29 2004 From: strk at keybit.net (strk) Date: Thu, 26 Feb 2004 17:52:29 +0100 Subject: [postgis-devel] sql files proliferation Message-ID: <20040226175229.A91634@freek.keybit.net> The postgis*sql* files are becoming a major part of the distribution (as number of files). What about having a single sql for each postgres version ? Or - better - use some preprocessing to keep in common things that are common and spilt things that need to be splitted ? I need to move geometry_column creation out of _common_ into each of version-specific files because for PG75 there is no need of attrelid oid, varattnum int and stats histogram2d; this already happended for schema support changes (addGeometryColumn, dropGeoemtryColumn, fix_geometry_columns) and is getting boring. --strk; From chodgson at refractions.net Thu Feb 26 09:37:22 2004 From: chodgson at refractions.net (chodgson at refractions.net) Date: Thu, 26 Feb 2004 09:37:22 -0800 Subject: [postgis-devel] sql files proliferation In-Reply-To: <20040226175229.A91634@freek.keybit.net> References: <20040226175229.A91634@freek.keybit.net> Message-ID: <1077817042.403e2ed29bddd@hydra> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pramsey at refractions.net Thu Feb 26 10:09:28 2004 From: pramsey at refractions.net (Paul Ramsey) Date: Thu, 26 Feb 2004 10:09:28 -0800 Subject: [postgis-devel] Re: sql files proliferation In-Reply-To: <20040226175229.A91634@freek.keybit.net> References: <20040226175229.A91634@freek.keybit.net> Message-ID: <403E3658.9070304@refractions.net> I wonder if we can use cpp and simple #ifdef stuff to do the preprocessing. Would be nice. P. strk wrote: > The postgis*sql* files are becoming a major part > of the distribution (as number of files). > What about having a single sql for each postgres version ? > Or - better - use some preprocessing to keep in common > things that are common and spilt things that need to be > splitted ? > > I need to move geometry_column creation out of _common_ > into each of version-specific files because for PG75 there > is no need of attrelid oid, varattnum int and stats histogram2d; > this already happended for schema support changes (addGeometryColumn, > dropGeoemtryColumn, fix_geometry_columns) and is getting boring. > > --strk; -- __ / | Paul Ramsey | Refractions Research | Email: pramsey at refractions.net | Phone: (250) 885-0632 \_ From strk at keybit.net Thu Feb 26 10:16:23 2004 From: strk at keybit.net (strk) Date: Thu, 26 Feb 2004 19:16:23 +0100 Subject: [postgis-devel] Re: sql files proliferation In-Reply-To: <403E3658.9070304@refractions.net>; from pramsey@refractions.net on Thu, Feb 26, 2004 at 10:09:28AM -0800 References: <20040226175229.A91634@freek.keybit.net> <403E3658.9070304@refractions.net> Message-ID: <20040226191623.A92340@freek.keybit.net> pramsey wrote: > I wonder if we can use cpp and simple #ifdef stuff to do the > preprocessing. Would be nice. > P. I made some use of cpp with languages other then C. The main problem is that cpp thinks #-lines are overlooked byt the compilers. GNU cpp version accepts -P to omit linemarkers -P Inhibit generation of linemarkers in the output from the preprocessor. This might be useful when running the preprocessor on something that is not C code, and will be sent to a program which might be confused by the linemarkers. ... but that could not be always the case. Can people try that (or similar) to solve the linemark problem ? --strk; > strk wrote: > > > The postgis*sql* files are becoming a major part > > of the distribution (as number of files). > > What about having a single sql for each postgres version ? > > Or - better - use some preprocessing to keep in common > > things that are common and spilt things that need to be > > splitted ? > > > > I need to move geometry_column creation out of _common_ > > into each of version-specific files because for PG75 there > > is no need of attrelid oid, varattnum int and stats histogram2d; > > this already happended for schema support changes (addGeometryColumn, > > dropGeoemtryColumn, fix_geometry_columns) and is getting boring. > > > > --strk; > > > -- > __ > / > | Paul Ramsey > | Refractions Research > | Email: pramsey at refractions.net > | Phone: (250) 885-0632 > \_ > > _______________________________________________ > postgis-devel mailing list > postgis-devel at postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-devel From dblasby at refractions.net Thu Feb 26 10:15:58 2004 From: dblasby at refractions.net (David Blasby) Date: Thu, 26 Feb 2004 10:15:58 -0800 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <8F4A22E017460A458DB7BBAB65CA6AE5026578@openmanage> References: <8F4A22E017460A458DB7BBAB65CA6AE5026578@openmanage> Message-ID: <403E37DE.4070407@refractions.net> I dont think there's ever been a report of '&&' (overlap) being broken. I dont think the original postgresql rtree implementations (rtree_box_overlap()) are incorrect (the postgis ones are based on them and I checked them for mistakes). Its more likely in the rtree_internal_consistent() function. We should test oleg and teodora's polygon gist rtree implementation to see if it correctly handles index searches like "@@" (etc). Then we can have them check it out and patch our implementation since its pretty much *exactly* like theirs. I'd like to see this fixed, but its a very rarely used feature to put too much work into fixing where they can do equivelent queries like: SELECT * FROM
WHERE the_geom && AND geometry_contains(, the_geom) AND contains(, the_geom); dave From strk at keybit.net Fri Feb 27 10:10:14 2004 From: strk at keybit.net (strk) Date: Fri, 27 Feb 2004 19:10:14 +0100 Subject: [postgis-devel] Re: geometry stats In-Reply-To: <403E37DE.4070407@refractions.net>; from dblasby@refractions.net on Thu, Feb 26, 2004 at 10:15:58AM -0800 References: <8F4A22E017460A458DB7BBAB65CA6AE5026578@openmanage> <403E37DE.4070407@refractions.net> Message-ID: <20040227191014.A1638@freek.keybit.net> dblasby wrote: > I dont think there's ever been a report of '&&' (overlap) being broken. I've seen with my eyes different result sets when using sequencial scan vs index scan. The results became the same when changing the call to box_overlap with a call to pgbox_overlap (made coying from geometry_overlap. > I dont think the original postgresql rtree implementations > (rtree_box_overlap()) are incorrect (the postgis ones are based on them > and I checked them for mistakes). > Its more likely in the rtree_internal_consistent() function. rtree_internal_consistent() calls pvbox_overlap (previoulsy box_overlap) for both RTContainedByStrategyNumber and RTOverlapStrategyNumber. Could this be the problem ? > We should > test oleg and teodora's polygon gist rtree implementation to see if it > correctly handles index searches like "@@" (etc). Then we can have them > check it out and patch our implementation since its pretty much > *exactly* like theirs. rtree_internal_consistent() calls *actually* theirs (apart from pgbox_overlap just introduced). > I'd like to see this fixed, but its a very rarely used feature to put > too much work into fixing where they can do equivelent queries like: > > SELECT * FROM
WHERE the_geom && AND > geometry_contains(, the_geom) AND contains(, the_geom); Again, the bug I've seen was on the the_geom && alone. I don't have notice of other bugs... but I can suppose there are other cases. We can just wait for bug reports and take no further step now. > dave --strk; From strk at keybit.net Sat Feb 28 06:52:33 2004 From: strk at keybit.net (strk) Date: Sat, 28 Feb 2004 15:52:33 +0100 Subject: [postgis-devel] Re: sql files proliferation In-Reply-To: <20040226191623.A92340@freek.keybit.net>; from strk@keybit.net on Thu, Feb 26, 2004 at 07:16:23PM +0100 References: <20040226175229.A91634@freek.keybit.net> <403E3658.9070304@refractions.net> <20040226191623.A92340@freek.keybit.net> Message-ID: <20040228155233.A11046@freek.keybit.net> I've added an experimental source sql file and a corresponding rule in the Makefile. CPP call is with -traditional-cpp flag, which is nicer with multiline strings and '--' comments. If you can give it a try we can test for compatibility. Run make postgis_new.sql (it's not built by default) to get postgis_new.sql generated from postgis.sql.in. The source file (postgis.sql.in) is not optimized to use cpp features, it just a copy of all the files previously used wrapped in #if USE_VERSION == blocks. Runnign a diff between postgis.sql and postgis_new.sql should give you just comments, blank lines and the postgis_version() function with double quotes ('') instead of slash escaped (/') quote inside the SQL function. --strk; strk wrote: > pramsey wrote: > > I wonder if we can use cpp and simple #ifdef stuff to do the > > preprocessing. Would be nice. > > P. > > I made some use of cpp with languages other then C. > The main problem is that cpp thinks #-lines are overlooked byt > the compilers. GNU cpp version accepts -P to omit linemarkers > > -P Inhibit generation of linemarkers in the output from > the preprocessor. This might be useful when running > the preprocessor on something that is not C code, and > will be sent to a program which might be confused by > the linemarkers. > > ... but that could not be always the case. > > Can people try that (or similar) to solve the linemark problem ? > > --strk; > > > strk wrote: > > > > > The postgis*sql* files are becoming a major part > > > of the distribution (as number of files). > > > What about having a single sql for each postgres version ? > > > Or - better - use some preprocessing to keep in common > > > things that are common and spilt things that need to be > > > splitted ? > > > > > > I need to move geometry_column creation out of _common_ > > > into each of version-specific files because for PG75 there > > > is no need of attrelid oid, varattnum int and stats histogram2d; > > > this already happended for schema support changes (addGeometryColumn, > > > dropGeoemtryColumn, fix_geometry_columns) and is getting boring. > > > > > > --strk; > > > > > > -- > > __ > > / > > | Paul Ramsey > > | Refractions Research > > | Email: pramsey at refractions.net > > | Phone: (250) 885-0632 > > \_ > > > > _______________________________________________ > > postgis-devel mailing list > > postgis-devel at postgis.refractions.net > > http://postgis.refractions.net/mailman/listinfo/postgis-devel > _______________________________________________ > postgis-devel mailing list > postgis-devel at postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-devel From pramsey at refractions.net Sat Feb 28 10:42:32 2004 From: pramsey at refractions.net (Paul Ramsey) Date: Sat, 28 Feb 2004 10:42:32 -0800 Subject: [postgis-devel] Re: sql files proliferation In-Reply-To: <20040228155233.A11046@freek.keybit.net> Message-ID: OS/X cpp chokes for reasons I have yet to tease out. Probably not a good sign for platform portability... cpp: Too many arguments On Saturday, February 28, 2004, at 06:52 AM, strk wrote: > I've added an experimental source sql file and a corresponding rule > in the Makefile. CPP call is with -traditional-cpp flag, which > is nicer with multiline strings and '--' comments. > > If you can give it a try we can test for compatibility. > > Run make postgis_new.sql (it's not built by default) to > get postgis_new.sql generated from postgis.sql.in. > The source file (postgis.sql.in) is not optimized to use > cpp features, it just a copy of all the files previously > used wrapped in #if USE_VERSION == blocks. > > Runnign a diff between postgis.sql and postgis_new.sql should > give you just comments, blank lines and the postgis_version() > function with double quotes ('') instead of slash escaped (/') > quote inside the SQL function. > > --strk; > > strk wrote: >> pramsey wrote: >>> I wonder if we can use cpp and simple #ifdef stuff to do the >>> preprocessing. Would be nice. >>> P. >> >> I made some use of cpp with languages other then C. >> The main problem is that cpp thinks #-lines are overlooked byt >> the compilers. GNU cpp version accepts -P to omit linemarkers >> >> -P Inhibit generation of linemarkers in the output from >> the preprocessor. This might be useful when running >> the preprocessor on something that is not C code, and >> will be sent to a program which might be confused by >> the linemarkers. >> >> ... but that could not be always the case. >> >> Can people try that (or similar) to solve the linemark problem ? >> >> --strk; >> >>> strk wrote: >>> >>>> The postgis*sql* files are becoming a major part >>>> of the distribution (as number of files). >>>> What about having a single sql for each postgres version ? >>>> Or - better - use some preprocessing to keep in common >>>> things that are common and spilt things that need to be >>>> splitted ? >>>> >>>> I need to move geometry_column creation out of _common_ >>>> into each of version-specific files because for PG75 there >>>> is no need of attrelid oid, varattnum int and stats histogram2d; >>>> this already happended for schema support changes >>>> (addGeometryColumn, >>>> dropGeoemtryColumn, fix_geometry_columns) and is getting boring. >>>> >>>> --strk; >>> >>> >>> -- >>> __ >>> / >>> | Paul Ramsey >>> | Refractions Research >>> | Email: pramsey at refractions.net >>> | Phone: (250) 885-0632 >>> \_ >>> >>> _______________________________________________ >>> postgis-devel mailing list >>> postgis-devel at postgis.refractions.net >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel >> _______________________________________________ >> postgis-devel mailing list >> postgis-devel at postgis.refractions.net >> http://postgis.refractions.net/mailman/listinfo/postgis-devel >> Paul Ramsey Refractions Research Email: pramsey at refractions.net Phone: (250) 885-0632 From strk at keybit.net Sun Feb 29 14:43:14 2004 From: strk at keybit.net (strk) Date: Sun, 29 Feb 2004 23:43:14 +0100 Subject: [postgis-devel] Re: sql files proliferation In-Reply-To: ; from pramsey@refractions.net on Sat, Feb 28, 2004 at 10:42:32AM -0800 References: <20040228155233.A11046@freek.keybit.net> Message-ID: <20040229234314.C21982@freek.keybit.net> pramsey wrote: > OS/X cpp chokes for reasons I have yet to tease out. Probably not a > good sign for platform portability... > > cpp: Too many arguments We could either find the right way to call CPP (autoconf?) or maybe opt for a perl script, since this is already required, to perform basic cpp (traditional) work. --strk; > > > On Saturday, February 28, 2004, at 06:52 AM, strk wrote: > > > I've added an experimental source sql file and a corresponding rule > > in the Makefile. CPP call is with -traditional-cpp flag, which > > is nicer with multiline strings and '--' comments. > > > > If you can give it a try we can test for compatibility. > > > > Run make postgis_new.sql (it's not built by default) to > > get postgis_new.sql generated from postgis.sql.in. > > The source file (postgis.sql.in) is not optimized to use > > cpp features, it just a copy of all the files previously > > used wrapped in #if USE_VERSION == blocks. > > > > Runnign a diff between postgis.sql and postgis_new.sql should > > give you just comments, blank lines and the postgis_version() > > function with double quotes ('') instead of slash escaped (/') > > quote inside the SQL function. > > > > --strk; > > > > strk wrote: > >> pramsey wrote: > >>> I wonder if we can use cpp and simple #ifdef stuff to do the > >>> preprocessing. Would be nice. > >>> P. > >> > >> I made some use of cpp with languages other then C. > >> The main problem is that cpp thinks #-lines are overlooked byt > >> the compilers. GNU cpp version accepts -P to omit linemarkers > >> > >> -P Inhibit generation of linemarkers in the output from > >> the preprocessor. This might be useful when running > >> the preprocessor on something that is not C code, and > >> will be sent to a program which might be confused by > >> the linemarkers. > >> > >> ... but that could not be always the case. > >> > >> Can people try that (or similar) to solve the linemark problem ? > >> > >> --strk; > >> > >>> strk wrote: > >>> > >>>> The postgis*sql* files are becoming a major part > >>>> of the distribution (as number of files). > >>>> What about having a single sql for each postgres version ? > >>>> Or - better - use some preprocessing to keep in common > >>>> things that are common and spilt things that need to be > >>>> splitted ? > >>>> > >>>> I need to move geometry_column creation out of _common_ > >>>> into each of version-specific files because for PG75 there > >>>> is no need of attrelid oid, varattnum int and stats histogram2d; > >>>> this already happended for schema support changes > >>>> (addGeometryColumn, > >>>> dropGeoemtryColumn, fix_geometry_columns) and is getting boring. > >>>> > >>>> --strk; > >>> > >>> > >>> -- > >>> __ > >>> / > >>> | Paul Ramsey > >>> | Refractions Research > >>> | Email: pramsey at refractions.net > >>> | Phone: (250) 885-0632 > >>> \_ > >>> > >>> _______________________________________________ > >>> postgis-devel mailing list > >>> postgis-devel at postgis.refractions.net > >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel > >> _______________________________________________ > >> postgis-devel mailing list > >> postgis-devel at postgis.refractions.net > >> http://postgis.refractions.net/mailman/listinfo/postgis-devel > >> > Paul Ramsey > Refractions Research > Email: pramsey at refractions.net > Phone: (250) 885-0632 > > _______________________________________________ > postgis-devel mailing list > postgis-devel at postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-devel From pramsey at refractions.net Sun Feb 29 16:48:35 2004 From: pramsey at refractions.net (Paul Ramsey) Date: Sun, 29 Feb 2004 16:48:35 -0800 Subject: [postgis-devel] Re: sql files proliferation In-Reply-To: <20040229234314.C21982@freek.keybit.net> Message-ID: <2BAB94E6-6B1A-11D8-A45E-000393D33C2E@refractions.net> Confusingly, I have yet to find *any* way to call cpp that works. In fact, it does not appear to work the way the man page says it should... it resolutely refuses to write an output file. Still trying. P. On Sunday, February 29, 2004, at 02:43 PM, strk wrote: > pramsey wrote: >> OS/X cpp chokes for reasons I have yet to tease out. Probably not a >> good sign for platform portability... >> >> cpp: Too many arguments > > We could either find the right way to call CPP (autoconf?) > or maybe opt for a perl script, since this is already required, > to perform basic cpp (traditional) work. > > --strk; > >> >> >> On Saturday, February 28, 2004, at 06:52 AM, strk wrote: >> >>> I've added an experimental source sql file and a corresponding rule >>> in the Makefile. CPP call is with -traditional-cpp flag, which >>> is nicer with multiline strings and '--' comments. >>> >>> If you can give it a try we can test for compatibility. >>> >>> Run make postgis_new.sql (it's not built by default) to >>> get postgis_new.sql generated from postgis.sql.in. >>> The source file (postgis.sql.in) is not optimized to use >>> cpp features, it just a copy of all the files previously >>> used wrapped in #if USE_VERSION == blocks. >>> >>> Runnign a diff between postgis.sql and postgis_new.sql should >>> give you just comments, blank lines and the postgis_version() >>> function with double quotes ('') instead of slash escaped (/') >>> quote inside the SQL function. >>> >>> --strk; >>> >>> strk wrote: >>>> pramsey wrote: >>>>> I wonder if we can use cpp and simple #ifdef stuff to do the >>>>> preprocessing. Would be nice. >>>>> P. >>>> >>>> I made some use of cpp with languages other then C. >>>> The main problem is that cpp thinks #-lines are overlooked byt >>>> the compilers. GNU cpp version accepts -P to omit linemarkers >>>> >>>> -P Inhibit generation of linemarkers in the output from >>>> the preprocessor. This might be useful when running >>>> the preprocessor on something that is not C code, and >>>> will be sent to a program which might be confused by >>>> the linemarkers. >>>> >>>> ... but that could not be always the case. >>>> >>>> Can people try that (or similar) to solve the linemark problem ? >>>> >>>> --strk; >>>> >>>>> strk wrote: >>>>> >>>>>> The postgis*sql* files are becoming a major part >>>>>> of the distribution (as number of files). >>>>>> What about having a single sql for each postgres version ? >>>>>> Or - better - use some preprocessing to keep in common >>>>>> things that are common and spilt things that need to be >>>>>> splitted ? >>>>>> >>>>>> I need to move geometry_column creation out of _common_ >>>>>> into each of version-specific files because for PG75 there >>>>>> is no need of attrelid oid, varattnum int and stats histogram2d; >>>>>> this already happended for schema support changes >>>>>> (addGeometryColumn, >>>>>> dropGeoemtryColumn, fix_geometry_columns) and is getting boring. >>>>>> >>>>>> --strk; >>>>> >>>>> >>>>> -- >>>>> __ >>>>> / >>>>> | Paul Ramsey >>>>> | Refractions Research >>>>> | Email: pramsey at refractions.net >>>>> | Phone: (250) 885-0632 >>>>> \_ >>>>> >>>>> _______________________________________________ >>>>> postgis-devel mailing list >>>>> postgis-devel at postgis.refractions.net >>>>> http://postgis.refractions.net/mailman/listinfo/postgis-devel >>>> _______________________________________________ >>>> postgis-devel mailing list >>>> postgis-devel at postgis.refractions.net >>>> http://postgis.refractions.net/mailman/listinfo/postgis-devel >>>> >> Paul Ramsey >> Refractions Research >> Email: pramsey at refractions.net >> Phone: (250) 885-0632 >> >> _______________________________________________ >> postgis-devel mailing list >> postgis-devel at postgis.refractions.net >> http://postgis.refractions.net/mailman/listinfo/postgis-devel > _______________________________________________ > postgis-devel mailing list > postgis-devel at postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-devel > Paul Ramsey Refractions Research Email: pramsey at refractions.net Phone: (250) 885-0632