[Slony1-general] Has anyone used Slony plus Heartbeat (from Linux HA)

Wed Sep 20 16:50:29 PDT 2006

On Sun, 17 Sep 2006, Manuel V?zquez Acosta wrote:

Sorry it took me so long to get back to you.

Attached is the script heartbeat calls on a failover.

Let me know if you have any questions about the settings in our heatbeat 
config files.

I hope to be able to get out a step-by step setup guide for this type of 
configuration sometime, but that doesn't seem likely this week.

> Steve,
>
> Thanks for your response. I'd appreciate if you can send me the
> scripts you wrote. I was thinking how to deal with the several cases
> that might occur, and how to implement those cases with smooth
> integration with heartbeat.
>
> We also have the policy of "manually put the cluster back to the
> normal/desired state".
>
> With regards to losing data we have our concerns and we're trying to
> find a work-around that fits our budget.
>
> Best regards,
> Manuel.
>
> On 9/16/06, Steve Singer <ssinger_pg at sympatico.ca> wrote:
>> On Sat, 16 Sep 2006, Manuel V?zquez Acosta wrote:
>> 
>> I've done it recently.
>> I don't have access to my notes or any of the config files at the moment 
>> but
>> can probably send you more details during the week.
>> 
>> 
>> It involves:
>> 
>> -Setting up slony as per normal with a master and a slave
>> -Setting up heartbeat to use a virtual IP on the master that your
>> applications will use when they want to connect to the database.
>> -Writing a script in heatbeat's resource.d that gets called on heart beat
>> start events for slony.
>> 
>> The script I wrote supported start and status operations.
>> For the start operation it had two modes
>> 
>> 1) A controlled switch over mode that I activated with a touch file in 
>> /tmp.
>> This mode  did a MOVE SET, and was for promoting the slave when the master
>> was still running (ie in preparation for taking the master down for
>> maintenance).
>> 2) A fail over mode that would execute if heartbeat on the slave detected 
>> that
>> the master went away.
>> 
>> The script also had a status command that would return if the database on
>> the local host was the origin for the slony replication set.
>> 
>> You also want to make sure that the auto_failback (It might be named
>> something different) is turned off so IF the primary does come back online
>> heartbeat does not try to make it the master again.
>> 
>> We took the approach that if the system did a fail over then the master was
>> dropped from the replication set and if it did come back online it would 
>> not
>> get automatically added back as a slony slave.   The system would run
>> without a second database until someone manually re-provisioned the primary
>> server as a new slony slave.
>> 
>> 
>> As Andrew indicated, if your primary server goes down and the secondary
>> becomes the master then any data on the primary that hadn't yet made it to
>> the secondary server at time of failover might be lost.   You would still 
>> be
>> able to access the database on the primary server (Once you fix whatever
>> went wrong in the first place) but slony won't be able to send the
>> unreplicated data out.
>> 
>> I don't have any experiences with failures in a production enviornment to
>> say how well it works.
>> 
>> 
>> 
>> Steve
>> 
>> > Hi all,
>> >
>> > I'm trying to set up a Active/Pasive configuration for PostgreSQL
>> > using Slony and Heartbeat.
>> >
>> > I'm wondering if anyone has done it before, and give some ideas.
>> >
>> > Best regards
>> > Manuel.
>> > _______________________________________________
>> > Slony1-general mailing list
>> > Slony1-general at gborg.postgresql.org
>> > http://gborg.postgresql.org/mailman/listinfo/slony1-general
>> >
>> 
>> 
>
-------------- next part --------------
#!/bin/bash
logger $0 called with $1
HOSTNAME=`uname -n`

NODE1_HOST=192.168.151.1
NODE2_HOST=192.168.151.2
slony_USER=slony
slony_PASSWORD=slony
DATABASE_NAME=mydb
CLUSTER_NAME=mycluster
PRIMARY_NAME=postgres1

#
# Returns 1 (TRUE) If the local database is the master
#
is_master () {
export PGPASSWORD=$slony_PASSWORD
RESULT=`psql $DATABASE_NAME -h localhost --user $slony_USER -q -t <<_EOF_
SELECT  count(*) FROM _$CLUSTER_NAME.sl_set WHERE 
set_origin=_$CLUSTER_NAME.getlocalnodeid('_$CLUSTER_NAME');
_EOF_`

   return $RESULT;
}

case "$1" in
start)
# start commands go here
is_master;
IS_MASTER=$?
if [ $IS_MASTER -eq 1 ]; then
#Already the master.  Nothing to do here.
echo "The local database is already the master"
exit 0;
fi
if [ "$HOSTNAME" == "$PRIMARY_NAME" ]; then
OLD_MASTER=2
OLD_SLAVE=1
else
OLD_MASTER=1
OLD_SLAVE=2
fi
if [ -f /tmp/CONTROLLED_FAILOVER ]; then
slonik<<_EOF_
cluster name=$CLUSTER_NAME;
node 1 admin conninfo = 'dbname=$DATABASE_NAME host=$NODE1_HOST 
user=$slony_USER password=$slony_PASSWORD';
node 2 admin conninfo = 'dbname=$DATABASE_NAME host=$NODE2_HOST 
user=$slony_USER password=$slony_PASSWORD';
lock set (id=1, origin=$OLD_MASTER);
wait for event(origin=$OLD_MASTER, confirmed=$OLD_SLAVE);

move set(id=1, old origin=$OLD_MASTER, new origin=$OLD_SLAVE);
wait for event(origin=$OLD_MASTER, confirmed=$OLD_SLAVE);

_EOF_
else
slonik<<_EOF_
cluster name=$CLUSTER_NAME;
node 1 admin conninfo = 'dbname=$DATABASE_NAME host=$NODE1_HOST 
user=$slony_USER password=$slony_PASSWORD';
node 2 admin conninfo = 'dbname=$DATABASE_NAME host=$NODE2_HOST 
user=$slony_USER password=$slony_PASSWORD';
failover (id=$OLD_MASTER, backup node = $OLD_SLAVE);
_EOF_
fi;

;;
stop)
# stop commands go here
;;

status)
# status commands go here
# If LOCALHOST reports itself as the master then status is 0
# otherwise status is 3
is_master;
RESULT=$?
if [ "$RESULT" -eq "1" ];  then
          echo "Local database is the master"
          exit 0
else
echo "Local database is a slave"
exit 3
fi

         ;;
esac