[Slony1-general] Replication problem of "BIG" table

Thu Mar 13 09:53:16 PDT 2008

On 3/13/2008 12:25 PM, Emanuel Petr wrote:
> Cédric Villemain wrote:
>> Le Thursday 13 March 2008, Emanuel Petr a écrit :
>>> Hi all,
>>> we have problem to replicate 12 GB table.
>>>
>>> Note: Other smaller tables were replicated without problem.
>>>
>>> Here is what I see on "slave" node.
>>> $ grep 'action"' /var/log/slony1/slon-db.log
>>> DEBUG2 remoteWorkerThread_1: prepare to copy table "public"."action"
>>> DEBUG2 remoteWorkerThread_1: copy table "public"."action"
>>> DEBUG2 remoteWorkerThread_1: Begin COPY of table "public"."action"
>>> NOTICE:  truncate of "public"."action" failed - doing delete
>>>
>>> DEBUG2 remoteWorkerThread_1: prepare to copy table "public"."action"
>>> DEBUG2 remoteWorkerThread_1: copy table "public"."action"
>>> DEBUG2 remoteWorkerThread_1: Begin COPY of table "public"."action"
>>> NOTICE:  truncate of "public"."action" failed - doing delete
>>> DEBUG2 remoteWorkerThread_1: 6584477358 bytes copied for table
>>> "public"."action"
>>>
>>> DEBUG2 remoteWorkerThread_1: prepare to copy table "public"."action"
>>> DEBUG2 remoteWorkerThread_1: copy table "public"."action"
>>> DEBUG2 remoteWorkerThread_1: Begin COPY of table "public"."action"
>>> NOTICE:  truncate of "public"."action" failed - doing delete
>>> DEBUG2 remoteWorkerThread_1: 6587572735 bytes copied for table
>>> "public"."action"
>>>
>>> DEBUG2 remoteWorkerThread_1: prepare to copy table "public"."action"
>>> DEBUG2 remoteWorkerThread_1: copy table "public"."action"
>>> DEBUG2 remoteWorkerThread_1: Begin COPY of table "public"."action"
>>> NOTICE:  truncate of "public"."action" failed - doing delete
>>>
>>> I can't find any error message in the log.
>> 
>> I think you probably have made other try before.
> 
> No, there was only one try. It was aim to show you, that COPY action for 
> this table is in a loop.

The question is what aborts the subscription process. Since the copy 
operation in the second and third attempt above did succeed, it might 
not be related to that table itself, but something that happens later. 
Can you provide the entire log from one "prepare to copy ..." to the next?

Jan

> 
>> 
>> If for any reason, slony can not truncate the table before replicating it, it 
>> delete every lines.
> 
> The table on "slave" node was blank before replication start.
> And the message "truncate of ... failed - doing delete" appears for 
> another tables, which were replicated without problem. So I don't think 
> it's the problem.
> 
> There is problem only with bigger (over 10GB) tables.
> 
>> 
>> 68GB/12GB give about 5 tests to replicate the table ?
>> 
>> I ommit the last versions of everything you are using.... (os, kernel, 
>> postgres and slony)
>> 
>> You must truncate manualy the tables on the 'slave', then restarting your 
>> replica from scratch.
> 
> And now I'm not able to work with this "badly" replicated table. Each 
> command on this table hang up. Note: Slony is stopped.
> 
> 
>> 
>>> COPY event for this "big" table is in a loop and table size on "slave"
>>> node is still growing.
>>>
>>> On "Master" node, the table size is 12GB.
>>> On "Slave" node the tables size was 68GB before I have stopped the
>>> replication.
>>>
>>>
>>> -------------
>>>
>>> OS: Ubuntu 6.06.2 LTS , 2.6.15-29-amd64-server, x86_64
>>>
>>> DB: postgresql-8.1
>>>
>>> SLONY: Version: 1.2.9-2ubuntu0~dapper0
>>>
>>>
>>> Detail of our problematic "12GB" table
>>> =# \d+ action;
>>>                                              Table "public.action"
>>>     Column   |            Type             |
>>> Modifiers                      | Description
>>> ------------+-----------------------------+--------------------------------
>>> ---------------------+------------- id         | integer                    
>>> | not null default
>>> nextval('action_id_seq'::regclass) |
>>>   clientid   | integer                     |
>>>
>>>   action     | integer                     | not null
>>>
>>>   response   | integer                     |
>>>
>>>   startdate  | timestamp without time zone | not null default now()
>>>
>>>   clienttrid | character varying(128)      | not null
>>>
>>>   enddate    | timestamp without time zone |
>>>
>>>   servertrid | character varying(128)      |
>>>
>>> Indexes:
>>>      "action_pkey" PRIMARY KEY, btree (id)
>>>      "action_servertrid_key" UNIQUE, btree (servertrid)
>>>      "action_action_idx" btree ("action")
>>>      "action_clientid_idx" btree (clientid)
>>>      "action_response_idx" btree (response)
>>>      "action_startdate_idx" btree (startdate)
>>> Foreign-key constraints:
>>>      "action_action_fkey" FOREIGN KEY ("action") REFERENCES enum_action(id)
>>>      "action_clientid_fkey" FOREIGN KEY (clientid) REFERENCES "login"(id)
>>>      "action_response_fkey" FOREIGN KEY (response) REFERENCES
>>> enum_error(id)
>>>
>>>
>>> ---------------
>>>
>>> Does anyone have an idea what could be wrong?
>>>
>>> Thanks,
>>> Petr
>>> _______________________________________________
>>> Slony1-general mailing list
>>> Slony1-general at lists.slony.info
>>> http://lists.slony.info/mailman/listinfo/slony1-general
>> 
>> 
>> 
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #