Who is Altan Khendup?

A professional technologist that dabbles in innovative and interesting uses of technology, Mongolian history, philosophy and cooking ethnic foods.

Often described as part philosopher, scholar, technologist, and mentor Altan likes engaging in stimulating conversations with professionals, tackling problems in a hands-on and collaborative manner with technology, and enjoying the company of good friends and family.

 

My Twitter Stream
« PeopleBrowsr my favorite Twitter client! | Main | Web 2.0? Web 3.0? What? »
Tuesday
May262009

Be prepared for the unexpected

One of the most common and variable factors in being technology is dealing with how unpredictable things can become within a complex application. Now there are tons of best practices on how to manage change, look at metrics, etc. However in the end, there are elements totally beyond most common experiences. 

For example a very commonplace occurrence in today's enterprise happens to be large datasets typically stored in relational databases. There are other sorts but this particular type I want to focus on. Very few companies build their own RDBMs so they adopt one from a known vendor such as Oracle, IBM, Microsoft, Sun, etc. The logic is sound. These products are well known to the industry. They have vast pools of experts that can be leveraged to achieve goals. And the best practices are well documented. Also the vendors offer comprehensive if not a tad expensive support options in case an enterprise runs into problems.

The issue tends to be storing large datasets in these products. No matter how well maintained, or well planned, something unpredictable will happen that causes a fine running product to fail. It can be as simple as a change in how the data is used, the amount, the frequency of activity, a threshold of data volume at a given moment, new code introduced, a routine maintenance activity, you name it. However when this event occurs problems begin to occur. Metrics and monitoring are designed to help identify abnormal behavior patterns but only when used in concert with business and operational metrics. Such a complementary information mesh can help proactively identify potential issues. Yet potential does not mean guaranteed. In certain cases you may have little to no warning when something goes wrong. It just does. 

It is this core event that makes internal technology groups cringe. Why? Because over the years, a very large majority of these groups do not really have processes to handle problems as they appear. They firmly believe that planning will prevent such issues. While I totally agree planning helps a great deal in managing a good portion of issues, it is NOT a guarantee of total prevention. In fact failure and it's consequences should always be factored into any running application. Very few companies do such effective planning from a holistic viewpoint.

Some of the most common assumptions that lead to being unable to deal with failure that I have seen in large enterprises have been:

  • Overly specialized silos. Each function is so specialized that naturally occurring issues are assigned to many groups at a time creating a significant communications challenge. One example is a server having a problem. In some companies a server is broken down into components: software, operating system, middleware, network, and hardware. When an issue occurs up to 5-6 people are called to coordinate what potential issues and solutions can be found. What should take a few minutes to diagnose can now take hours.
  • "That's not my problem." No one likes getting that 2am call about a problem. However everyone is responsible for getting it fixed. All too often groups consider themselves apart from the daily operations of a business. Developers take the stance they design to spec, they are not responsible for production issues. Quality Control is only responsible for a point-in-time snapshot not continuois monitoring. IT was not aware of business projections for an application and hence lecture their business units when capacity is reached. And on it goes. These forms of infighting create atmospheres of competition and aloofness, not the cooperation and collaboration required to get to the heart of a problem and fix it. 
  • "Too many dials and knobs." In an era of creating endlessly flexible configurations one of the most common problems happens to be creating simple and useful configurations. Delivering an application with over 1,000 different configurations while admittedly very flexible is too overwhelming for many when trying to determine the most effective options to choose to meet demands. Rather than produce so many options, it is always best to deliver a targeted set of instrumentation that is simple yet effective. While this takes time in the design, it is highly worth it in the production deployment. 
  • "Not keeping an open mind." All too often when confronted with issues, pre-conceived notions appear in an attempt to better isolate issues. The fault with this approach is that in many cases the exact problem cause is unknown so eliminating possibilities is very bad since it can cut off avenues of discovery that can help in determining problems. An example comes to mind of a class deadlock. The developers swore up and down that their Java code was indeed considering all of the possibilities of a deadlock yet it plagued the application. After calmly explaining that reviewing the operation of their code would help with the issue affecting millions of customers, it was evident that while their code was reasonably clean it relied on a 3rd party library that was suspect. After reviewing that library it was evident that a lock on a database resource was to blame which impacted their code. Keeping yourself open to possibilities helps insure that when something does break it can be reviewed objectively and holistically.

Even in the most custom of handwritten code, something invariably behaves differently than expected. Only by accepting that things can go wrong no matter how well planned, can enterprise technology groups hope to tackle problems quickly and efficiently.

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>