[Slony1-general] UTF-8 Data

Wed Sep 13 02:52:06 PDT 2006

----- Original Message ----- 
From: "Christopher Browne" <cbbrowne at ca.afilias.info>
To: "Marcin Mank" <marcin.mank at gmail.com>
Cc: "Steve Burrows" <steve at jla.com>; "Slony1 general"
<slony1-general at gborg.postgresql.org>
Sent: Tuesday, September 12, 2006 10:54 PM
Subject: UTF-8 Data

> Marcin Mank wrote:
> >> There surely should be some better way, such as finding which specific
> >> tuples are problematic, and updating them on the source database.
> >>
> >> A thought...  You might do two dumps:
> >>
> >> 1.  Raw, no conversion using iconv
> >>
> >> 2.  Another, which converts using iconv [generate this by passing file
#1
> >>
> > thru iconv...]
> >
> >
> > I am just struggling with this issue, my solution:
> >
> > CREATE OR REPLACE FUNCTION utf8_encode(text)
> >   RETURNS text AS
> > $BODY$
> > ($s)=@_;
> >  utf8::encode($s);
> > return $s;
> > $BODY$
> >   LANGUAGE 'plperlu' IMMUTABLE;
> >
> >
> >
> > and now :
> >
> > foreach suspect table {
> >     update table set field=utf8_encode(field) where field<>
> > utf8_encode(field)
> > }
> >
> > kinda slow, but might be good enough.
> >
> > Greetings
> > Marcin
> >
>
> That is likely to break when the field can be NULL, right?  After all
> NULL <> NULL...

(NULL<> NULL) is NULL;
(NULL <> 'anything') is NULL;

so it works. Ternary logic can trick anyone.

Greetings
Marcin