[Slony1-general] service needs (was: Migrating From gBorg)

Tue Sep 19 12:54:35 PDT 2006

I'm going to take this list, and expand it some, in point form.

On Tue, Sep 19, 2006 at 10:27:03AM -0700, Darcy Buskermolen wrote:
> Ok well then at this end, lets all start compiling a list of reqirements for 
> the slony project.
> 
> 1) Reliable, redundent, distributed infrasctructure (manpower, hardware, 
> bandwith)

I.   On service availability

A.  Reliable infrastructure

1.	Service uptime (!= individual server uptime, maybe)

2.	Network uptime

3.	In case of service failure, individuals must be available to
solve the problem

	a.  there should always be at least two people in the project
who know how to repair any given service

	b.  there should always be at least two such people who have the
access rights to repair any given service

	c.  there should always be at least two such people who have
the authority to decide to repair any given service

	d.  ideally, the "at least two people" principle above means
"at any one time".  So when people take vacation, are offline, &c.,
someone else should be able to step in.  

B.  Service redundancy and distribution: how to achieve reliability.

1.	To the extent technically feasible, every service should be
delivered from at least two machines.

2.	To the extent technically feasible, every project-critical
service should be delivered from at least two geographically and
topologically distributed locations.

3.	To the extent that (1) and (2) is not technically feasible,
"warm standby" systems should be prepared for failover conditions.

C.	Policy and high availability

1.	Services should be classified according to "critical",
"valuable", &c. (or some such similar scale); and each of these
levels should have some planned level of response time to failures. 
(An initial suggestion is "24/7" and "12/5" service levels, but
I'm open to suggestions here.)

2.	A communication plan for failures is at least as important as
the ability to fix problems: a well-communicated failure with
information well-distributed to the community will cause less damage
than one poorly acknowledged.

3.	Predictable outages of longer duration are preferable to
unpredicted outages of any duration

4.	Occasional scheduled "fire drills" should be conducted to
test the viability of infrastructure plans.

> 2) visable intergration with PostgreSQL and other components

Would a "replication for PostgreSQL home" be helpful here?  Like a
www.postgresql.org/replication home site that included Slony,
pgcluster, pgpool, &c?

> 3) Increased usability (newbee) documentaion
> 4) Increased uability (newbee) tools 

I think these are important, but for an infrastructure discussion,
presumably what we want is the ease of delivering these: 

III.	Easy documentation maintenance

A.	The ability to deliver user-friendly documentation for new
users

B.	Integration of documentation with the main web site

C.	Easy maintenance by community, so that no individual is a
potential "blocker" on documentation updates

IV.	Easy delivery of tools

A.	New-user tools 

B.	Test infrastructure

I'm sure this isn't everything, but please feel free, all of you, to
rip into me.

A

-- 
Andrew Sullivan  | ajs at crankycanuck.ca
The plural of anecdote is not data.
		--Roger Brinner