Slony-I 2.0.5 Documentation | ||||
---|---|---|---|---|
Prev | Fast Backward | Fast Forward | Next |
Defining the nodes indicated the shape of the cluster of database servers; it is now time to determine what data is to be copied between them. The groups of data that are copied are defined as "replication sets."
A replication set consists of the following:
Tables that are to be replicated
Sequences that are to be replicated
Slony-I needs to have a primary key or candidate thereof on each table that is replicated. PK values are used as the primary identifier for each tuple that is modified in the source system. Note that they can be composite keys composed of multiple NOT NULL columns; they don't need to consist of single fields. There are two ways that you can get Slony-I to use a primary key:
If the table has a formally identified primary key, SLONIK SET ADD TABLE can be used without any need to reference the primary key. Slony-I can automatically pick up that there is a primary key, and use it.
If the table hasn't got a primary key, but has some candidate primary key, that is, some index on a combination of fields that is both UNIQUE and NOT NULL, then you can specify that key, as shown in the following example.
SET ADD TABLE (set id = 1, origin = 1, id = 42, full qualified name = 'public.this_table', key = 'this_by_that', comment='this_table has this_by_that as a candidate primary key');
However, once you have come this far, there is little reason not to just declare some suitable index to be a primary key, which requires that the columns involved are NOT NULL, and which will establish a unique index. Here is an example of this:
DROP INDEX my_table_name_col1_col2_uniq_idx; ALTER TABLE my_table_name ADD PRIMARY KEY (col1, col2);
If your application is not somehow referencing the index by name, then this should not lose you anything, and it gives you the clear design benefit that a primary key has been declared for the table.
Notice that while you need to specify the namespace for the table, you must not specify the namespace for the key, as it infers the namespace from the table.
It is not terribly important whether you pick a "true" primary key or a mere "candidate primary key;" it is, however, strongly recommended that you have one of those instead of having Slony-I populate the PK column for you. If you don't have a suitable primary key, that means that the table hasn't got any mechanism, from your application's standpoint, for keeping values unique. Slony-I may, therefore, introduce a new failure mode for your application, and this also implies that you had a way to enter confusing data into the database.
It will be vital to group tables together into a single set if those tables are related via foreign key constraints. If tables that are thus related are not replicated together, you'll find yourself in trouble if you switch the "master provider" from one node to another, and discover that the new "master" can't be updated properly because it is missing the contents of dependent tables.
There are also several reasons why you might not want to have all of the tables in one replication set:
The initial COPY_SET event for a large set leads to a long running transaction on the provider node. The FAQ outlines a number of problems that result from long running transactions that will injure system performance.
If you can split such a large set into several smaller pieces, that will shorten the length of each of the transactions, lessening the degree of the "injury" to performance.
Another issue comes up particularly frequently when replicating across a WAN; sometimes the network connection is a little bit unstable, such that there is a risk that a connection held open for several hours will lead to CONNECTION TIMEOUT. If that happens when 95% done copying a 50-table replication set consisting of 250GB of data, that could ruin your whole day. If the tables were, instead, associated with separate replication sets, that failure at the 95% point might only interrupt, temporarily, the copying of one of those tables.
These "negative effects" tend to emerge when the database being subscribed to is many gigabytes in size and where it takes many hours or even days for the subscriber to complete the initial data copy. For relatively small databases, this shouldn't be a material factor.