Friday, January 27, 2006

N-tier Infrastructures, ITIL & Cross-Silo Performance Base Lines

It's no accident that the evolution to SOA and adoption of ITIL are happening at the same time. IT Service Management is, as the name indicates, about Services. If you're evolving to Service Oriented Architectures (SOA), then adoption of IT service management based on the IT Infrastructure Library is simply common sense. The supporting infrastructures enabling SOA are typically n-tier --- web front ends, application servers, data base servers, etc.

N-tier infrastructures are typically made up of multiple Configuration Item (CI) Segments, for example the Citrix segment, the backbone WAN segment, the Web front-end segment, etc. These are often the "IT silos" we here about.

Implementing a quality framework such as ITIL is very much about establishing cycles of continuous improvement and shifting paradigms from silos to services. In fact, the sooner you can establish the concept of rapid cycles of continuous improvement within your service improvement teams the better.

Many clients focus the initial ITIL implementation efforts on Change, Configuration and Release Management, which can lead to an initial improvement cycle that is just too long. This is especially true if --- in an attempt to define services from the business' perspective --- service definition takes an 'end-to-end' view, since all the tiers are now involved.

This increases the scope and complexity of CI relationships and CMDB establishment, and (more often than not) leads to the purchase of a CMDB tool...perhaps before you're really ready, since you may not have had any improvement cycles in other ITSM process areas. Design and development of the CMDB should be carefully planned, and must support every ITIL process.

Implementing cross-silo performance monitoring (i.e., true service monitoring, not simply response time monitoring), can provide service base lines of performance across every layer of every tier of your n-tier infrastructure.

This offers several advantages:
  • Clearly establishes service performance in both business and IT terms
  • Quickly identifies cross-silo dependencies for each service
  • Helps scope and target configuration design, development & base lining activities

Implementing a well designed CMDB is important, but can take many months. Installing a service monitor can be accomplished in weeks. In addition, establishing service monitoring --and obtaining cross-silo performance base lines for each critical service --- may help you get to a level of process maturity where CMDB definition and design makes more sense, and save you money along the way.

John M. Worthington, Principal
MyServiceMonitor, LLC

Friday, January 13, 2006

Monitoring - The Evolution

Monitoring their IT-infrastructures has historically been an afterthought for organizations, priority has always centered around application development and deployment. Hence traditionally the monitoring industry always lags behind a little bit as technologies evolve and shift in the landscape of application development and delivery. In recent years there is an emerging disconnect in the market between how applications are designed to work and how they are being monitored. Lets take it from the beginning... In early days applications were relatively simpler. Twenty years back, there were mainframes and clients, so if there was a problem it was very easy to locate, it was either in the mainframe - which affected everyone - or in the client. They had two relatively easy pieces to monitor. Most of the legacy players in the monitoring industry today evolved at this stage.

Later came the networking era where networks became a lot more complex and problems at the network level became an industry nightmare. At this stage every single problem was blamed on the network and most of the times it turned out to be true. So many tools cropped up especially to deal with monitoring networks and isolating issues at the network level... Over a period of time networks became more stabilized as the networking technology improved. But the fiasco of early days left such an indelible mark in the industry that even today in most organizations network department is really a secretive cult and no one outside of it gets to know their internals. The legacy monitoring players took time to get the network piece right but they eventually solved the network puzzle to a reasonable extent... so the market now has a set of key players who can do Client/server, legacy and network monitoring well.

By the time this played out the technology in the application development and rendering landscape has moved on... to n-tier architectures. N-tier architectures provide extreme flexibility, portability and scalability to application services. IT-industry has embraced the n-tier distributed architecture for its effectiveness and cost efficiency. This is the preferred architecture for the omnipresent web-services. An unattractive side effect of the n-tier architecture is that it introduced an amazing amount of complexity in the delivery infrastructure. Now multiple applications written in multiple languages running on multiple pieces of hardware must co-exist for the service to be effective. Due to the interdependency any small issue on one of these tiers tend to have a big impact on the service in a cascading effect. This coupled with the complex nature of the systems makes the process of isolating and identifying issues within the system a nightmare.

The solution put forward by legacy monitoring players to this problem is silo monitoring… effectively a tool to monitor every tier. In this model say for a simple Web-service you would have 3 different tools monitoring 3 different tierss (web, app, db). These tools are strong in their own domain and need a domain expert to run it. When there is a problem in the overall service you have different tools run by different domain experts who need to be brought together to identify what is the root-cause and what needs to be fixed. Since there is no transparency across the layers, most of these meetings turn into an exercise in the Blame-game. People tend to get defensive about their tier and it takes an extraordinarily long time to isolate even the simplest problems in this model. Hence the approach to monitoring the n-tier architectures by monitoring every tier individually as proposed by legacy monitoring players doesn’t work. This is the primary reason for the chaos in service delivery and affects the quality of service delivery even for fortune500 companies.

The right way to do this is to monitor the entire service as a single atomic unit instead of individual tiers. Monitoring every tier end to end and then bringing them together to view it as a single service gives you a complete perspective of this service. This enables the tool to be able to assess the impact of failures across the entire service. Also this tool needs to have a sophisticated enough correlation engine to be able to differentiate between causes and effects when an n-tier architecture goes through a cascading effect. Building a tool that monitors all the tiers of an n-tier infrastructure with equal competence and represent them in a uniform model is not an easy task. This is the primary reason why you don’t see many tools in the market that do that. Finally this tool has to provide the service operator information that he can act upon immediately rather than data that puts the onus on him to figure out the event. This is where the future of monitoring industry lies.