Bug 336 - Review transaction isolation level requirements for slon database connections.
Summary: Review transaction isolation level requirements for slon database connections.
Status: NEW
Alias: None
Product: Slony-I
Classification: Unclassified
Component: slon (show other bugs)
Version: devel
Hardware: All All
: low normal
Assignee: Slony Bugs List
URL:
Depends on:
Blocks:
 
Reported: 2014-03-19 13:26 UTC by Jan Wieck
Modified: 2015-12-10 11:17 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Wieck 2014-03-19 13:26:38 UTC
slon currently uses READ ONLY ISOLATION LEVEL SERIALIZABLE DEFERRABLE for some transactions, in particular the remote listener. This causes event propagation to stall during another long running, serializable transaction.

I believe that only event creation and copy_set done against the origin require serializable. The copy_set can use serializable deferrable. All other slon transaction should work fine with isolation level repeatable read.
Comment 1 Tom Tignor 2015-12-10 11:12:47 UTC
The following is a summary of discussion on a crippling performance problem resolved by reducing the remote listener from serializable to repeatable read isolation. Per Steve S, I'm using this ticket to recommend we move forward with a solution like this in some future release. The consensus from the discussion appeared to be:

 - Reduced isolation for the remote listener could be safe and effective for certain workloads.
 - Should provide as a non-default runtime option.
 - Should include big red warnings in the documentation.

----


Hello slony1 community,
I’m part of a team at Akamai working on a notification service based on postgres. (We call it an Alert Management System.) We’re at the point where we need to scale past the single instance DB and so have been working with slony1-2.2.4 (and postgresql-9.1.18) to make that happen. Most tests in the past few months have been great, but in recent tests the reassuring SYNC-event-output-per-two-seconds suddenly disappeared. Throughout the day, it returns for a few minutes (normally less than 5, never 10) and then re-enters limbo. Vigorous debugging ensued, and the problem was proven to be the serializable isolation level set in slon/remote_listen.c. Our recent test environment doesn’t have a tremendous write rate (measured in KB/s), but it does have 200-400 clients at any one time, which may be a factor. Below is the stack shown in gdb of the postgres server proc (identified via pg_stat_activity) while slon is in limbo.
What are the thoughts on possible changes to the remote listener isolation level and their impact? I’ve tested changes using repeatable read instead, and also with serializable but dropping the deferrable option. The latter offers little improvement if any, but the former seems to return us to healthy replication. In searching around, I found Jan W filed Bug 336 last year (link below) which suggests we could relax the isolation level here and elsewhere. If it was helpful, I could verify an agreed solution and submit it back as a patch. (Not really in the slony community yet, just looking at the process now.)
Thanks in advance,

http://www.slony.info/bugzilla/show_bug.cgi?id=336


(gdb) thread apply all bt

Thread 1 (process 13052):
#0  0xffffe430 in __kernel_vsyscall ()
#1  0xf76d2c0f in semop () from /lib32/libc.so.6
#2  0x08275a26 in PGSemaphoreLock (sema=0xf69d6784, interruptOK=1 '\001') at pg_sema.c:424
#3  0x082b52cb in ProcWaitForSignal () at proc.c:1443
#4  0x082bb57a in GetSafeSnapshot (origSnapshot=<optimized out>) at predicate.c:1520
#5  RegisterSerializableTransaction (snapshot=0x88105a0) at predicate.c:1580
#6  0x083b3f35 in GetTransactionSnapshot () at snapmgr.c:138
#7  0x082c460a in exec_simple_query (
    query_string=0xa87d248 "select ev_origin, ev_seqno, ev_timestamp,        ev_snapshot,        \"pg_catalog\".txid_snapshot_xmin(ev_snapshot),        \"pg_catalog\".txid_snapshot_xmax(ev_snapshot),        ev_type,        ev_data1,"...)
    at postgres.c:948
#8  PostgresMain (argc=1, argv=0xa7cd1e0, dbname=0xa7cd1d0 "ams", username=0xa7cd1b8 "ams_slony") at postgres.c:4021
#9  0x08284a58 in BackendRun (port=0xa808118) at postmaster.c:3657
#10 BackendStartup (port=0xa808118) at postmaster.c:3330
#11 ServerLoop () at postmaster.c:1483
#12 0x082854d8 in PostmasterMain (argc=3, argv=0xa7ccb58) at postmaster.c:1144
#13 0x080cb430 in main (argc=3, argv=0xa7ccb58) at main.c:210
(gdb)


Tom    :-)

----

The last time we had a change to isolation levels was in response to
this thread


http://lists.slony.info/pipermail/slony1-general/2011-November/011939.html

Also know as bug #255 (http://www.slony.info/bugzilla/show_bug.cgi?id=255)

I can't recall if anyone figured out if we could reduce the remote
listener isolation level to read committed - read only or not.

One concern at the back of my mind is if a read only repeatable read
transactions would result in more pivot conflicts than a read only
serializable deferrable transaction where the conflicts are with a
serializable transaction running on the origin by some application
transaction.

----

	
	Hi Steve,
	Sorry for the delay getting back. Inspired by your questions, I¹ve been
reading up on SSI, the Cahill paper and slony1 and postgres code. To
answer your question, I don¹t believe reducing the isolation level for the
remote listener can increase pivot conflicts. As I understand pivots, they
sit in the middle of a ³dangerous structure,² on either side of an
rw-dependency relationship for two other transactions. So a read-only
transaction can¹t be a pivot. Also, since we¹re not changing the data
remote listener reads, I don¹t believe we¹d be creating new
rw-dependencies and so making pivots of other transactions.
	For us, I think there is a broader issue. I found the README-SSI in the
postgres 9.1.18 package. It seems clear the benefits of SSI in postgres
only arrive if all your transactions are serializable.

‹‹
    * Any transaction which is run at a transaction isolation level
other than SERIALIZABLE will not be affected by SSI.  If you want to
enforce business rules through SSI, all transactions should be run at
the SERIALIZABLE transaction isolation level, and that should
probably be set as the default.

‹‹

	Comments in predicate.c also seem to support the idea.
	I believe all the apps in our DB (other than slony1) are using the
default read committed isolation level. As I review our DB-facing procs, I
can see listeners have rw-dependencies on the remote worker (via sl_event)
and the remote worker has an rw-dependency on any of our clients writing
to sl_log_1/2. As I understand SSI, that constitutes a ³dangerous
structure,² but we still can¹t expect postgres SSI to save us if the
clients are non-serializable. Under these conditions, what benefit comes
from serializable slony1 transactions?
	Maybe a solution could be to provide a reduced serialization level as a
runtime option? Requirements vary between apps. For bank transactions,
it¹s certainly clear that everything should be bulletproof. Far better to
get it done late than to do it wrong. For our notification service,
though, timeliness is more important. No ones likes losing data, but the
value of the data degrades in minutes (and unaddressed alarms are likely
to be regenerated.) It¹s far less tolerable to stop replication in its
tracks for long periods in order to achieve serializability.
	I see this message has gotten long. Thanks in advance for your time and
consideration.

	Tom    :-)

----

For whatever it's worth, when Slony was first designed SERIALIZABLE
was the only option -- the alternative was READ COMMITTED and IIRC
there were some corner cases where the data changing in the log tables
could make a big difference (memory fails, but I think it had to do
with log switching).  ISTR it was partly a lesson we learned from
erserver, where some of the basic concepts underlying Slony had been
worked out.

I'd advocate a lot of caution in making changes, ideally by analysing
all the kinds of transaction paths Slony could be seeing.  But with
new isolation levels available, it's probably worth doing some
analysis.

A

----

On Wed, Nov 18, 2015 at 02:26:15PM +0000, Tom Tignor wrote:
...
Sorry for the delay getting back. Inspired by your questions, I¹ve been
reading up on SSI, the Cahill paper and slony1 and postgres code.
...

It should be pointed out that 9.1 goes EOL (End of Life) in less than
a year (Sep 2016), and transaction handling has changed a *lot* since then,
so any changes that core Slony makes may not even work for you.

(FWIW, I think dropping the isolation level in this particular
instance seems safe, however.)

--
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

----


	Greg, Andrew,
	Thanks for the feedback. Greg, can you describe the transaction handling
changes you’re referring to? I recently got the latest pg 9.4
distribution. The README-SSI is identical and while there have been some
changes in predicate.c, they don’t appear sweeping. The doc on Transaction
Isolation appears pretty much the same too.
	A general question for the group: if we would consider a change like this
(as a runtime option or otherwise), what’s the correct way to move it
forward? Should I file a bug? Are there specific tests or analysis which
should be performed?

	Tom    :-)

----

I would use bug 336 for this.
My gut is to make this a runtime option since I am not confident that
downgrading the isolation level won't impact someones workload, having
said that I am open to being convinced otherwise.  I would also be
inclined to leave the default value unchanged for 2.2.x and then maybe
consider changing the default in 2.3.   If we actually any concrete
plans for a 2.3 version I would say just leave the change for 2.3 but I
don't see a 2.3 release happening soon and I don't want to change the
default behaviour in a minor release.


Generally when we make a change like this I try to do lots of runs
through the disorder/clustertest suite that doesn't cover all types of
workloads.  Have you tried this change on your workload? Does it
actually help things?

It would be great if Jan and Chris were to comment on this thread

----


	Yes indeed. With this change, we see replication on all three subscriber
nodes in the 2-5 secs range per SYNC event. (We’re configured with the
2-sec default.) Without it, the subscribers spend most of the day
paralyzed, with sporadic blocks of some minutes of consciousness which
they use to frenetically catch up.
	Figure I’ll await others’ comments and then post excerpts to bug 336.
Your point on not impacting others’ workloads is very well taken. A
non-default runtime option seems like the right way forward.

	Tom    :-)

----

Without taking SYNC snapshots in a SERIALIZABLE transaction I believe
that a Slony-I replica could suffer the same inconsistency dangers that
a pg_dump without --serializable-deferrable can suffer. Namely a replica
would not be usable as a source for reporting. From the 9.4 pg_dump docs:

"This option is not beneficial for a dump which is intended only for
disaster recovery. It could be useful for a dump used to load a copy of
the database for reporting or other read-only load sharing while the
original database continues to be updated. Without it the dump may
reflect a state which is not consistent with any serial execution of the
transactions eventually committed. For example, if batch processing
techniques are used, a batch may show as closed in the dump without all
of the items which are in the batch appearing."

Changing the default isolation levels(s) may therefore change, what a
replica can safely be used for and I believe that creating reports is
one of the major use cases. Using options with big, bold, red, flashing
warnings in the documentation would be the only way to go.


Regards, Jan

----

I was wondering if this is actually possible or not.

The remote slon only selects from  sl_event and sl_log_*.  The remote
worker is only going to see rows that are covered by a snapshot range in
the SYNC in sl_event.  Rows in sl_log_* might be visible from a
transaction point of view but they won't be captured by the where
conditions for pulling from sl_log.

The snapshot for the event in sl_log is done by the local sync
connection which is a read-write connection not by a slon remote
read-only connection.

(I'm ignoring copy_set from the above analysis).

----


	One point to clarify: while a general review of isolation levels could be
helpful, the problem I saw and addressed was specifically in the remote
listener. The remedy I’ve been working with is a one-line change. The
behavior we had without it seems similar to the observation Jan makes in
the bug 336 description.


--- src/slon/remote_listen.c.SAVE	2015-11-12 11:09:06.405693227 -0500
+++ src/slon/remote_listen.c	2015-11-12 11:13:00.157825287 -0500
@@ -318,7 +318,7 @@
			}
			if (PQserverVersion(dbconn) >= 90100)
			{
-				slon_mkquery(&query1, "SET SESSION CHARACTERISTICS AS TRANSACTION
read only isolation level serializable deferrable");
+				slon_mkquery(&query1, "SET SESSION CHARACTERISTICS AS TRANSACTION
read only isolation level repeatable read");
				res = PQexec(dbconn, dstring_data(&query1));
				if (PQresultStatus(res) != PGRES_COMMAND_OK)
				{

	Tom    :-)

----

On Thu, Nov 19, 2015 at 06:09:54PM +0000, Tignor, Tom wrote:
	Thanks for the feedback. Greg, can you describe the transaction handling
changes you’re referring to?

Sorry, my bad - I misremembered the serializable overhaul as happening
in 9.2, not 9.1.

--
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8