Single point of failure

2

March 16, 2009 by temebele

Single Point of Failure

Last week on a Sunday afternoon I was reading MSDN article about failover clusters and single point of failure.  A Single Point of Failure, (SPOF), is a part of a system which, if it fails, will stop the entire system from working. High availability is very important for a business like ours where customers depend on our online services.  The only way to achieve this high availability is by avoiding those single points of failure.  We do this by building systems that have redundancy built in. I’m going to leave it to our Infrastructure team to talk about the details of our high availability architecture. But something else clicked in my head after I finished reading the article, single point of failure in software development team.

Even though the concept of single point of failure is mostly used for devices and networking systems, it won’t be too big of a stretch to try applying it to a development team scenario. Have you ever thought about a developer being a single point of failure for their company’s continued success? Here is my take on it. In software development, domain knowledge is not something that we get overnight.  It is something that we develop through time as we keep working on projects and interacting with the domain experts in that area.  It may take months or sometimes years to be an expert in a specific domain.  Now here is where the problem arises.  As developers get the opportunity to be enriched with that specific domain knowledge which is critical to the growth of the business, we tend to keep that knowledge to ourselves and not share it fully with fellow team members. If you have come across hallway discussions like “…this new app is going to be my job security” or “… don’t think I’ll be able to take my vacation because nobody else understands how this application works” or “…this system is too complicated, I wish other developers don’t mess with it” or “…my project has the most visibility to management” then you probably understand what I’m talking about.  But one thing that we all forget is that we are setting up ourselves to be that single point of failure for the system we are building.

I sat for a while and tried to think through the pros and cons of being that single point of failure, the single go-to person for a system.  It is undeniably true that being the only person who fully understands critical systems makes us more important.  But I believe the disadvantages are even more. First of all it contradicts with our main goal, making the business profitable.  The probability that a human fails is even more unpredictable than the probability of a device failing. We may be forced to stay away from work due to many unexpected reasons like the death of a loved one or sickness. It would be a big failure on our part if our absence creates chaos in the normal day to day activity and prevents the company from a continued growth.   Also being that single person will prevent us from fully enjoying our hard earned vacations.   The other disadvantage is if you don’t share you won’t get opinions of other team members, so the system that you develop will only be as good as your thinking. We may also make other developers in the team feel left out. In this aspect, I’m so appreciative of a project manager I worked with who understood the effects of SPOF and always makes sure that all team members have a stake on each other’s project. Being a single point of failure also puts a big burden of responsibility on our shoulder.  We feel a bit relaxed only when we know that a fellow member can do the same work that we do with a little guidance. We shouldn’t also forget that software systems always change. The system that we are so proud of may be so obsolete and be completely replaced with a new one.  Not only that but software developers always tend to abandon the good old existing application and create a brand new application when they find it hard to learn how the existing system works.

To summarize I think it is a big quality to develop domain knowledge and expertise in the different areas of the business. The business needs experienced people who can take responsibility when something happens.  It is also completely acceptable that you become more important as you gain experience and skills.  But when you start feeling too important and the business hinges on you then you are probably setting up yourself to be the SPOF.  And being the SPOF is going to have negative impact both on yourself and the business. Developers who have experience in the different areas should always be open to share and mentor fellow developers in their team who can fill the gap in their absence. That is the only way for a continued growth and success in a .COM company where developers mean a lot to the business.
So next time you smell these symptoms in someone, whisper in their ears in a friendly voice, “My friend, this smells like SPOF and it is not good”.  Even though this article primarily focused on software development team, every one of you can take the concept and apply it in the different teams that you are part of.

Advertisements

2 thoughts on “Single point of failure

  1. This is a really good point. I sense, however, that it is equally important, if not more so, for business managers to understand this than the operational developers and system admins themselves.

    It is certainly great for operational staff to take such a point to heart and solve the problem. However, they may be an SPOF for reasons that prohibit them from realizing there is a problem in the first place, or even caring at all.

    They may be ineffective delegators, for example. They always put off delegation or training other staff because they think they can just get it done quicker if they do it themselves. This is usually true, of course – they can. But in the end, nobody else learns the job.

    They may also just simply not have the wherewithal to document, train, or teach without some outside pressure to do so. The “wherewithal” would be any number of things such as a lack of laziness to train, teach, and document to an understanding of the importance in doing those things.

    It is entirely possible, though almost criminal, for a staff member to relish the role of a SPOF because they know it is job security. It is hard to spot this type of person because, even if it were true of them, they are at least smart enough not to admit it. They can’t be fired and they know it. And sadly, they probably need to be fired, but can’t be.

    I would add to your reminder, an emphasis for business mangers to consider it also. Without a little top-down guidance, some individuals may not ever build the redundancy to themselves that the company really requires.

  2. temebele says:

    Great comment Cody, you completed the message of my article.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: