[postgis-users] Problem with shp2pgsql and the -W character encoding option

Mark Cave-Ayland mark.cave-ayland at ilande.co.uk
Fri Feb 7 09:18:56 PST 2014


On 07/02/14 16:21, Michael Fricker wrote:

> Hello. I’ve run into a problem with shp2pgsql, at the command prompt,
> where the –W option to change the default character encoding of UTF-8
> to, for example LATIN1, is not working. If I’m correct, the –W character
> encoding option should allow you to switch from the default UTF-8 to any
> other character encode that is required. In my case it is LATIN1. I used
> the following line below to store the .sql file.
>
> shp2pgsql.exe -d -s 2956:3401 -I -W "LATIN1"
> "C:\0\Canada_Land_Survey_Data\test\Deliverables\CLSR_Land_Parcel.shp"
> scratch.ab_clsr_land_parcel >
> C:\0\Canada_Land_Survey_Data\Data_Dump\Land_Parcel_Error.sql
>
> A portion of the result is shown below.
>
> SET CLIENT_ENCODING TO UTF8;
>
> SET STANDARD_CONFORMING_STRINGS TO ON;
>
> SELECT DropGeometryColumn('scratch','ab_clsr_land_parcel','geom');
>
> DROP TABLE "scratch"."ab_clsr_land_parcel";
>
> BEGIN;
>
> CREATE TABLE "scratch"."ab_clsr_land_parcel" (gid serial,
>
> "pin" float8,
>
> "designator" varchar(254),
>
> "remain_ind" varchar(254),
>
> "planno" varchar(30),
>
> "admin_code" varchar(20),
>
> "pcl_type" varchar(254),
>
> "lcselectio" varchar(100),
>
> "remarks" varchar(254),
>
> "pcl_state" varchar(254),
>
> "reg_pin" varchar(254));
>
> ALTER TABLE "scratch"."ab_clsr_land_parcel" ADD PRIMARY KEY (gid);
>
> SELECT
> AddGeometryColumn('scratch','ab_clsr_land_parcel','geom','3401','MULTIPOLYGON',2);
>
> I would have expected the first line to have read SET CLIENT_ENCODING TO
> LATIN1 instead of SET CLIENT_ENCODING TO UTF8.
>
> Initially I was attempting to upload a shapefile to a database at work
> using this command;
>
> shp2pgsql.exe -d -s 2956:3401 -I -W "LATIN1"
> "C:\0\Canada_Land_Survey_Data\test\Deliverables\CLSR_Land_Parcel.shp"
> scratch.ab_clsr_land_parcel | psql.exe -h "localhost" -p "5432" -U
> "postgres" -d "major_testing_db"
>
> Instead of being uploaded, the resulting error occurred:
>
> ERROR: unterminated quoted string at or near “’”
>
> LINE 1: …o”,”remarks”,”pcl_state”,”reg_pin”,geom> VALUES <’1115077’,’
>
> ^
>
> Reading through the .sql file, I tracked down the problem to be a
> character not supported by UTF-8 in the following line (the small arrow
> highlighted in red);
>
> INSERT INTO "scratch"."ab_clsr_land_parcel"
> ("pin","designator","remain_ind","planno","admin_code","pcl_type","lcselectio","remarks","pcl_state","reg_pin",geom)
> VALUES ('1115077','EJERE K’ELNI KUE INDIAN RESERVE 196I','No','84988
> CLSR AB','09888','Inside Canada
> Lands',NULL,NULL,'ACTIVE',NULL,ST_Transform('01060000208C0B00…….
>
> When I used the shp2pgsql-gui and selected LATIN1 for the character
> encoding, the shapefile was uploaded to the database without incident.
> Any help and/or explanation would be appreciated. The version of both
> shp2pgsql and shp2pgsql-gui that I have used is 2.1.0. Thanks to anyone
> for their help.

Hi Michael,

In general terms, the loader works by forcing the client encoding to 
UTF8 and then using iconv to manually convert each input string from the 
source encoding (LATIN1 in your case) to UTF8.

AFAIK all LATIN1 characters can be converted to UTF8, so if you've found 
a character that doesn't convert then it's highly likely the source 
encoding of your shapefile isn't LATIN1. If you're based on Windows, you 
could try -W WIN1252 to specify the default Windows encoding to see if 
that helps at all.

The big mystery at the moment is why there is a difference between the 
command line and GUI versions as they both use the same engine 
internally. I know that older versions of the GUI used to default to a 
different encoding, so could it be that you're seeing this change after 
upgrading from a pre-2.1 version?


ATB,

Mark.


More information about the postgis-users mailing list