Steve Singer steve at ssinger.info
Mon Jul 27 16:49:15 PDT 2015
On 07/27/2015 08:57 AM, Jan Wieck wrote:
> On 07/26/2015 10:48 PM, Steve Singer wrote:
>> On 07/26/2015 08:01 PM, Jan Wieck wrote:
>>> On 07/26/2015 07:35 PM, Jan Wieck wrote:
>>>> That said, pg_dump isn't that much faster than Slony's copy_set() so
>>>> we'd need to find a way to reconstruct an sl_setsync entry from
>>>> something like a binary base backup. That is not trivial.
>>>
>>> Here is a wild idea that I am going to test, but someone may throw a
>>> wrench into it while I'm doing that:
>>>
>>> Let us assume that the application can be stopped and the database
>>> brought into so called "single user mode" for a short period of time.
>>> During that maintenance window we do the following:
>>>
>>>     * Shutdown the application
>>>     * Create the cluster with two nodes and paths via slonik script.
>>>     * Shutdown the slon for the slave.
>>>     * Create a SYNC event on the master.
>>>     * Execute a SUBSCRIBE SET ... OMIT COPY.
>>>     * Create an LVM snapshot of $PGDATA (+ pg_xlog ...)
>>>     * Resume the application
>>>
>>> That should take no longer than a few minutes. Properly scripted it
>>> could be in the second range. We now could
>>>
>>>     * Use LVM snapshot(s) to plow over the slave's $PGDATA and so on.
>>>     * Modify the slave's DB to correct the local node ID and fake an
>>>       sl_setsync entry reflecting the SYNC created on the master.
>>>     * Fire up the slave's slon.
>>>
>>>
>>> Comments?
>>>
>>
>> I think that would work.
>>
>> Is there  a way we could do this without an application outage.
>>
>> It seems to me that if you take a backup of the master wih pg_basebackup
>> or through any other consistent method of a binary backup you could look
>> at the restored instance and say
>>
>> * Any transactions that show as committed in the last SYNC that shows up
>> in the restored sl_event table are committed
>> * Any transactions that were not included in the above, but have actions
>> in sl_log_1 or sl_log_2 on the restored instance were committed at the
>> time of the backup (otherwise the rows inserted by those transactions
>> couldn't be in sl_log)
>> * Any transactions that were committed by the next SYNC on the origin
>> but don't show up in the above list can be pulled as part of the first
>> real SYNC to be processed, ie you would show those as 'in progress' (or
>> not yet started) for fake sl_setsync value you create.
>
> I think this is insufficient. Transactions that were in progress when
> the backup started are not guaranteed to be committed by the next SYNC.
> In fact they could still be in progress many SYNCs later.
>

Exactly but those transactions will show up in the in_progress list for 
the next sync.    The data from them won't be part of the snapshot.

Consider the case where you have an origin with

SYNC 1000:1004:1000,1001,1002
and then transaction 1000 and 1002 commits after the SYNC but before the 
backup snapshot.


The next SYNC (which isn't included in the backup) would be
SYNC 1001:1006,1001

Once we restore the snapshot on the new node slony would need to look at 
the origin and see 1001:1006,1001.

It would then determine that transactions 1000,1002,1004,1005 all 
committed in between the first SYNC and the second SYNC.

It would then search sl_log_1 and sl_log_2 on the new node to see if 
there are log rows from any of those 4 transactions.   In this case it 
would see transactions in sl_log for 1000 and 1002.   Therefore the 
transactions that need to be grabbed by the first real sync on the new 
node are 1004,1005.

The fake sl_setsync entry could be something like
1001:1005:1004


Or am I still missing something?


> What probably could work to produce that snapshot is to wait for the
> sl_log, that was active (inserted into) at backup time to be closed -
> IOW a log switch to complete. All xids found in that log on the master
> EXCEPT all xids in that log on the restored slave are to be considered
> in progress at backup time.
>
> This should build an artificial snapshot that tells us where to start.
> Any SYNC after the log switch completed would make for a suitable next
> SYNC to leap to. So we would record this artificial snapshot as a SYNC
> with ev_seqno - 1 of the next SYNC.
>
>
> Regards, Jan
>



More information about the Slony1-general mailing list