Wed Sep 20 10:18:47 PDT 2006
- Previous message: [Slony1-general] service needs
- Next message: [Slony1-general] Documentation Improvements
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 9/19/06, Andrew Sullivan <ajs at crankycanuck.ca> wrote: > I'm going to take this list, and expand it some, in point form. > > On Tue, Sep 19, 2006 at 10:27:03AM -0700, Darcy Buskermolen wrote: > > Ok well then at this end, lets all start compiling a list of reqirements for > > the slony project. > > > > 1) Reliable, redundent, distributed infrasctructure (manpower, hardware, > > bandwith) > > I. On service availability > > A. Reliable infrastructure > > 1. Service uptime (!= individual server uptime, maybe) > > 2. Network uptime s/uptime/availability/ ? If we differentiate between services using tiers then network uptime becomes implicit. For example: i. First tier services include DNS resolution, essential (perhaps static) web-pages and backups / off-site synchronization / replication ii. Second tier services include CVS (or whatever source control system is selected) and maybe mailing lists iii. Third tier services include non-static web-pages and mailing-list archives I see you're aiming at something like this below, but maybe differentiating the tier from makes the document easier to maintain? > 3. In case of service failure, individuals must be available to > solve the problem > > a. there should always be at least two people in the project > who know how to repair any given service > > b. there should always be at least two such people who have the > access rights to repair any given service > > c. there should always be at least two such people who have > the authority to decide to repair any given service > > d. ideally, the "at least two people" principle above means > "at any one time". So when people take vacation, are offline, &c., > someone else should be able to step in. e. there should be a clear, obvious way to determine, in the event of a failure, who gets contacted first, second, etc. and how they are to be contacted (to the extent that this is possible without getting the attention of spammers and their ilk). > B. Service redundancy and distribution: how to achieve reliability. > > 1. To the extent technically feasible, every service should be > delivered from at least two machines. > > 2. To the extent technically feasible, every project-critical > service should be delivered from at least two geographically and > topologically distributed locations. I think that redundant locations is certainly desireable. I don't know if it's reasonably acheivable, but we need to save discussion about how to do it until after we've decided what we want to do. > 3. To the extent that (1) and (2) is not technically feasible, > "warm standby" systems should be prepared for failover conditions. 4. To the extent technically feasible, slony should be used to implement this redundancy. For example, if documentation is in the form of a Postgres driven wiki, it would be excellent to document the install and operation of this wiki as a practical example of best-practices in action for newbies. (I'm working on the assumption that db driven CMS are common enough, usefull enough and simple enough to be interesting for newbies) > C. Policy and high availability > > 1. Services should be classified according to "critical", > "valuable", &c. (or some such similar scale); and each of these > levels should have some planned level of response time to failures. > (An initial suggestion is "24/7" and "12/5" service levels, but > I'm open to suggestions here.) > > 2. A communication plan for failures is at least as important as > the ability to fix problems: a well-communicated failure with > information well-distributed to the community will cause less damage > than one poorly acknowledged. Chris' comments about pulling the subscriber list seem reasonable. However, I think we need to stick with first getting our requirements down, then we can start figuring out implementation. > 3. Predictable outages of longer duration are preferable to > unpredicted outages of any duration > > 4. Occasional scheduled "fire drills" should be conducted to > test the viability of infrastructure plans. > > > 2) visable intergration with PostgreSQL and other components > > Would a "replication for PostgreSQL home" be helpful here? Like a > www.postgresql.org/replication home site that included Slony, > pgcluster, pgpool, &c? While clearly valuable and important, I'm not sure how this relates directly to slony project infrastructure. Do we want to offer to share hosting with other postgresql replication solutions? > > 3) Increased usability (newbee) documentaion > > 4) Increased usability (newbee) tools > > I think these are important, but for an infrastructure discussion, > presumably what we want is the ease of delivering these: > > III. Easy documentation maintenance > > A. The ability to deliver user-friendly documentation for new > users > > B. Integration of documentation with the main web site > > C. Easy maintenance by community, so that no individual is a > potential "blocker" on documentation updates D. Aggressively encourage and support feedback, especially by new users so that the quality of the documentation can constantly be improved. > IV. Easy delivery of tools > > A. New-user tools > > B. Test infrastructure > > I'm sure this isn't everything, but please feel free, all of you, to > rip into me. Me too. </aol> Drew
- Previous message: [Slony1-general] service needs
- Next message: [Slony1-general] Documentation Improvements
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list