aiwaniuk at instytut.com.pl aiwaniuk at instytut.com.pl
Wed May 26 05:18:10 PDT 2010
On Tue, May 25, 2010 at 09:49:19AM -0400, Steve Singer wrote:
> > segfault at 273936 ip 00007f359f4f4c40 sp 00007f359b982698 error 4 in
> > libc-2.9.so[7f359f474000+168000]
> 
> Can you rub gdb against a core file, or start slony up inside of gdb, so 
> we can get a stack trace of what slon was doing went it died?
> 
> (from a build with debugging symbols would be even more useful)

i've faced the same problem while replicating postgres8.3 using slony
2.0.3
subscribe operation had finisher successfully, but then while synchronizing
tables i still have segfaults. backtrace has shown

#0  0xb7761f93 in strlen () from /lib/libc.so.6
#1  0x0805e35d in slon_appendquery_int (dsp=0xb4b11a38, fmt=0x8068af5
"_log_%d_",ap=0xb42fe1cc "\f`Ä´\022ˇ\t\b\036ˇ\t\b0Ĺ„\006\b\020ˇ\t\b\022ˇ\t\b\036ˇ\t\b\214ă/´đ\vÄ
´\b\fÄ
´\002")   at /usr/include/bits/string3.h:52
#2  0x0805e7d7 in snprintf (conn=0xb4b11a38) at /usr/include/bits/stdio2.h:65
#3  db_checkSchemaVersion (conn=0xb4b11a38) at dbutils.c:311
#4  0x08051f75 in sync_helper (cdata=0xb4c4600c) at remote_worker.c:4993
#5  0xb783c42f in start_thread () from /lib/libpthread.so.0
#6  0xb77be79e in clone () from /lib/libc.so.6
    

i looked up through sources (dbutils.c) and in function slon_appendquery_int
in 
switch (*fmt)
        {
                        case 's':
while executing: dstring_append(dsp, s) (which is in fact macro) there
is indeed strlen. i thought that problem lies in fact in dstring_nappend
(executed after strlen) function which makes some memory allocation operations. 
but code in slony 1.2 (which i'm using sucesfully right now) looks the same. 

then i thought that maybe this precisely OS/platform issue, but runnig slave
slonny on slave machine and master machine gives the same effect (both
linux, but one x86 other amd64, different glibcs, different kernels)

i end up in conculsion that data structure or row in replicated table is
the couse - one row of replicated data occupies 300kB of space (select
saved using psql with \o).



funny thing is the way slony created replication sqls (this is output of
slony):
2010-04-28 14:51:45 CESTERROR  remoteWorkerThread_1:
insert into TABLE1 (....) values (13314972,................); -OK
insert into TABLE2 (....) values (..,13314972,............); - there is
refference from table2 to table1, then vierd entry
insert into TABLE1 rt into TABLE1 (....) values (13314972,................); -again but modyfied
insert into TABLE2 (....) values (..,13314972,............); -again the
same
insert into TABLE1 rt into TABLE1 (....) values (13314972,................);
insert into TABLE2 (....) values (..,13314972,............);
insert into TABLE1 rt into TABLE1 (....) values (13314972,................); 
insert into TABLE2 (....) values (..,13314972,............); 
insert into TABLE1 rt into TABLE1 (....) values (13314972,................);
insert into TABLE2 (....) values (..,13314972,............); 
insert into TABLE1 rt into TABLE1 (....) values (13314972,................); 
insert into TABLE2 (....) values (..,13314972,............); 
insert into TABLE3 ................
insert into TABLE3 ................
insert into TABLE3 ................
insert into TABLE3 ................
insert into TABLE3 ................
ERROR:  syntax error at or near "rt"
LINE 3: insert into TABLE1" rt into "pub...
- qualification was:
2010-04-28 14:51:45 CESTERROR  remoteWorkerThread_1: SYNC aborted




i finaly gave up and turn to other things. 

-- 
IaS


More information about the Slony1-general mailing list