I have a very simple table:
create table foo(z varchar not null, bar int not null, primary key(z));
now, I try to insert bunch of data to it using:
insert into foo(bar, z) select generate_series(1,100) as bar, encode( gen_random_bytes(100), 'escape') as z;
on slon's insert db will reply:
ERROR: invalid byte sequence for encoding "UTF8": 0xdf15
which is quite right, since data is not escaped. I am inserting random data, on purpose - because we do have that in here, and it works fine with my C code, that so far replicated the data.
Further more, I think if slony would use PQexecParams instead of PQexec to insert data, that problem wouldn't arose in first place.
What are the encodings of the two databases (e.g. - origin + subscriber)?
My suspicion is that either:
a) Both use UTF-8, but the origin is on PostgreSQL 7.4 or 8.0, which had problems with its handling of Unicode encoding validation, or
b) The origin is using SQL-ASCII or similar, and the subscriber is using Unicode/UTF-8.
If either of those speculations are in fact the case, then this may NOT be a bug; Slony-I does not promise to do inter-encoding translations.
Question to consider:
Should the slon verify that it uses a single common encoding on all DB connections?
We intend to address this in a different way, by ensuring that slon processes are using the same client encoding on all databases.
Changing importance to "normal". This can be avoided by using
alter user slony set client_encoding to <common-encoding>;
on all servers in the Slony cluster and NOT having any of the databases use server_encoding SQL_ASCII (which is discouraged anyway).
We still want to fix it, but since this is a completely new feature, we won't implement it before 2.1.