Server Plans for 2008
Summary
The process of becoming one of the leading (and largest) Business VoIP Providers in 2007 has taught us volumes about how to engineer our services. In addition, our customer base and server loads have grown and we need a new platform for delivering email and web services, which needs to fix the design flaws now apparent in our current (2004) platform.
Using server virtualisation, and greater amounts of replicated storage, on top of a new core network, we believe we have developed a blueprint for a 2008 enterprise-class infrastructure. We have nearly completed prototyping this and are looking forward to expanding its roll out.
Introduction
Gradwell’s customer base has continued to expand and we have spent a lot of time in 2007 thinking about and exploring how to build a new platform for email and web hosting delivery.
This blog post sets out our current thinking on our infrastructure plans for 2008, and I would welcome feedback, either using the comment box below, or via email to peter@gradwell.com
Our web and email hosting infrastructure was largely built in 2003/2004 and whilst much of the physical equipment has been maintained, replaced, scaled and upgraded, it uses design concepts from the earlier years which are now beginning to appear dated.
In addition, usage of internet hosting has grown so that today many of our customers run significant businesses via their internet presence and equally, the pressures on the London data centres has grown so that they are operating at higher loads and in some instances, higher temperatures.
The largest change in mentality has been in the handling of minor faults. Our VoIP experiences teach us that customers can tolerate something being off/down - what they dislike is the uncertainty and instability of not knowing whether it is on or off. Reoccuring intermittent faults are worse than big outages, and so we have to close the gap on 100% uptime, whilst remaining a relatively low cost provider of services.
Looking back at 2007, we have also begun to appreciate that we need to build an infrastructure that is resiliant of supplier’s failures, and importantly, intermittent supplier faults. We would not have previously considered the need to cope with datacentre failure - but datacentres do have operational problems that are not sufficiently big to be a disaster, but do cause unnecessary disruption.
Finally, it may seem that the answer to the problems is only to use premium equipment and facilities. However, this belief is a falsity, because (a) we are using two of the best London datacentres (Telehouse and Telecity Sovereign House) and whilst we do use some premium grade servers, our business model does not support buying IBM for everything (and three years ago, when some of our servers still hail from, many of the redundant facilities we have today in our HP Proliant servers simply did not exist).
The Challenge
So, things have moved on greatly and therefore we set about in late summer trying to find a workable blueprint for our next iteration of server facilities. The challenge is to guard against the following:
- Hard Disk failure, and the performance impact caused by the subsequent rebuild
- Air conditioning failure - causing equipment to power down
- Software Failure of an individual system that many things are dependent upon
- Physical Server failure, e.g. RAM, CPU, Power Supply
In addition, we need to:
- Maintain a large number of our existing server software configurations and operating systems (because customers have software written to support those), whilst replacing the physical hardware.
- We need to reduce our idle power consumption from servers because it is expensive and wasteful.
- We need to make our server load more evenly balanced. At present, we have some very busy servers, and some very idle ones.
Physical Improvements
We have made a number of improvements to our systems and network in 2007, to lay the ground work for this project. This predominantly covered the deployment of physical equipment:
- New swtiched network topology and the first phase of some core switch upgrades on our Telehouse LAN
- Deployment of a new Juniper Network Router platform, using 5 Juniper J4350 Routers.
- 24, Low Voltage Quad Core 1.86 Gig HP DL380 Servers, with local SAS based storage, split across TH and Sov.
- Additional disk storage, using 3 Infortrend iSCSI SAN arrays, with 16×250, 16×500 and 16×750 gig SATA disks, again, split between TH and Sov.
Physically, in early 2008, we have also planned:
- Migration our Telehouse switch core to a Cisco switch platform (from HP) in a fully redundant configuration and also make the Sovereign House switch fabric more redundant.
- Increase the RAM on our HP DL380 servers.
- Add a second London inter-datacentre fibre link.
Software and Managements Improvements
Adding equipment solves some issues, but creates others. For example, our current web server environment (FreeBSD, Feb 2004) won’t run on the HP servers, and we could not migrate customer websites in one go to a new environment.
We also have many new services that we wish to deploy, and we plan to do those in a redundant fashion, potentially leaving a myriad of server infrastructure in place, but unused until we ramp up the customer base.
Therefore, we have been experimenting with a number of techniques to resolve this problem, mainly using virtualisation and have built an initial environment using VMWare, from which we have been experimenting, testing and evaluating. Results so far have been very positive.
Moving to a virtualised environment brings us the following benefits:
- We will separate software systems from hardware and will be able to migrate “servers” from one physical environment to another without downtime.
- We can migrate legacy operating systems onto new hardware without significant reconfiguration.
- We can deploy many more software instances and operating system environments into our virtualised environment, than we could do with physical servers, thus allowing us to “double up” on all services and processes.
- We can further delegate the operation of our services to software automation, allowing for even faster response to problems.
Improving Storage
One of our remaining key challenges is how to improve our storage, so that our new infrastructure platform does not collapse when a disk fails. Firstly, future developments in VMWare due in 2008 will allow us to move server file systems from one machine to another.
With regards customer file storage, we have spent time evaluating both the expensive (http://www.equallogic.com/) options as well as the less expensive (happily there is no cheap solution!) options (for example: http://www.datacore.com/products/prod_SANmelody.asp).
Our key requirement for storage is that we have high performance arrays, which can suffer multiple disk failure and rebuild without impacting live application performance, but on which we can cost effectively mirror and replicate the data across two datacentres (protecting against aircon and power failure).
Having deployed additional new storage, we are now progressing with our evaluation of the best option for management and backup.
In addition, we have also been working on the mechanism for distributing files from the hard disks to the web servers. Currently we use NFS,
Our Server Platform
Readers may be interested in the following brief summary of our anticipated server platform:
- Front end web hosting servers x8
- Caching DNS Servers x2 sets of 3
- Primary DNS Servers x2 sets of 3
- Call Routing DNS Servers x4
- Load Balanced servers for Email x4
- IMAP (Online Email Folder) servers x4
- Pop3 servers x4
- Email virus scanners (clamav) x10
- Email spam filters (spam assassin) x10
- Email forwarding servers (gwh) x10
- Email outbound relay (exim) x4
- Email inbound relay (gwh) x4
- Email Quarantine database x2
- Email Logging x2
- Email List Distribution Servers x2
- Jabber Instant Messaging servers x2
- Mysql database server x3 plus MySQL Cluster x4
- Zimbra Collaboration and Email servers x5
- Usenet servers x3
VoIP Servers
- In/outbound asterisk call processing servers (for iax + newsip) x10
- Sip Registration x2
- Sip Call routing x3
- Mysql database cluster x4 for newsip to back off on
- Prepay permissions servers x3
- Voicemail servers x3
Datacentre Issues
Over 90% of the Uk Internet “population” is within London and we will always need to have a well connected presence there. However, it is increasingly apparent that we can operate more of our services from servers physically located out side of London, and indeed, our most recent Telecoms interconnects have been at a point of presence in Leeds.
With regards web and email hosting, we have identified Edinburgh as a suitable location and have completed business planning and agreed funding for expansion into Edinburgh in the first half of 2008, which is dependent on our having completed our succesful migration to our VMWare platform in London.
How does this help us deploy new services
We have a number of plans for new hosted services, including improved email with mobile integration (Zimbra) and online secure messaging using Instant Messaging (Jabber). We also want to expand our web hosting again, to support new languages (e.g. new versions of Ruby) and more hosted web applications.
By being able to deploy new services in a fully redundant configuration, with out having to expend significant monies on physical resources (i.e. VMware will let us setup 5 Zimbra servers, but only provide the physical memory when needed), we can build new services more quickly and to a higher standard.
Conclusions
As we have developed, in 2007, into one of the UK’s leading VoIP platform operators, we have identified a number of areas in which we could improve our email and web hosting product platforms. In doing so, we also wanted to solve a number of the problems that are faced by both ourselves and our peers in the industry, and end up with a blueprint that was suitable for our continued and rapid future expansion.
Whilst a number of questions remain unanswered and on the agenda for January 2008, we hope that this review of the work done in 2007 has assisted customers in their understanding of our plans for supporting their future growth.
One Response to “Server Plans for 2008”
Leave a Reply
You must be logged in to post a comment.

February 3rd, 2008 at 1:03 am
[...] to update customers on the changes we’ve made to our infrastructure, following on from the announcements made back in December 2007, and close the [...]