Friday, January 13, 2006

Monitoring - The Evolution

Monitoring their IT-infrastructures has historically been an afterthought for organizations, priority has always centered around application development and deployment. Hence traditionally the monitoring industry always lags behind a little bit as technologies evolve and shift in the landscape of application development and delivery. In recent years there is an emerging disconnect in the market between how applications are designed to work and how they are being monitored. Lets take it from the beginning... In early days applications were relatively simpler. Twenty years back, there were mainframes and clients, so if there was a problem it was very easy to locate, it was either in the mainframe - which affected everyone - or in the client. They had two relatively easy pieces to monitor. Most of the legacy players in the monitoring industry today evolved at this stage.

Later came the networking era where networks became a lot more complex and problems at the network level became an industry nightmare. At this stage every single problem was blamed on the network and most of the times it turned out to be true. So many tools cropped up especially to deal with monitoring networks and isolating issues at the network level... Over a period of time networks became more stabilized as the networking technology improved. But the fiasco of early days left such an indelible mark in the industry that even today in most organizations network department is really a secretive cult and no one outside of it gets to know their internals. The legacy monitoring players took time to get the network piece right but they eventually solved the network puzzle to a reasonable extent... so the market now has a set of key players who can do Client/server, legacy and network monitoring well.

By the time this played out the technology in the application development and rendering landscape has moved on... to n-tier architectures. N-tier architectures provide extreme flexibility, portability and scalability to application services. IT-industry has embraced the n-tier distributed architecture for its effectiveness and cost efficiency. This is the preferred architecture for the omnipresent web-services. An unattractive side effect of the n-tier architecture is that it introduced an amazing amount of complexity in the delivery infrastructure. Now multiple applications written in multiple languages running on multiple pieces of hardware must co-exist for the service to be effective. Due to the interdependency any small issue on one of these tiers tend to have a big impact on the service in a cascading effect. This coupled with the complex nature of the systems makes the process of isolating and identifying issues within the system a nightmare.

The solution put forward by legacy monitoring players to this problem is silo monitoring… effectively a tool to monitor every tier. In this model say for a simple Web-service you would have 3 different tools monitoring 3 different tierss (web, app, db). These tools are strong in their own domain and need a domain expert to run it. When there is a problem in the overall service you have different tools run by different domain experts who need to be brought together to identify what is the root-cause and what needs to be fixed. Since there is no transparency across the layers, most of these meetings turn into an exercise in the Blame-game. People tend to get defensive about their tier and it takes an extraordinarily long time to isolate even the simplest problems in this model. Hence the approach to monitoring the n-tier architectures by monitoring every tier individually as proposed by legacy monitoring players doesn’t work. This is the primary reason for the chaos in service delivery and affects the quality of service delivery even for fortune500 companies.

The right way to do this is to monitor the entire service as a single atomic unit instead of individual tiers. Monitoring every tier end to end and then bringing them together to view it as a single service gives you a complete perspective of this service. This enables the tool to be able to assess the impact of failures across the entire service. Also this tool needs to have a sophisticated enough correlation engine to be able to differentiate between causes and effects when an n-tier architecture goes through a cascading effect. Building a tool that monitors all the tiers of an n-tier infrastructure with equal competence and represent them in a uniform model is not an easy task. This is the primary reason why you don’t see many tools in the market that do that. Finally this tool has to provide the service operator information that he can act upon immediately rather than data that puts the onus on him to figure out the event. This is where the future of monitoring industry lies.

6 Comments:

At 10:40 AM, Blogger roger_nixon said...

Have you thought about SOA - "Service Oriented Architectures" and what that could do? The vision for SOA is that every thing is a service :-) and the data flow may be decided dynamically - e.g., which active directory service to use, which payment gateway to use, etc. If SOA really gets implemented in the way it is being envisioned, the folks in the management space will have work to do for decades :----)

 
At 11:35 AM, Blogger Robert Butler said...

Having been in the networking space for long,I completely agree with your view of how monitoring should be for n-tier architectures. The challenge is getting the buy-in from all the silos... to get a single product across the entire service through all the tiers... I am fighting that battle right now..

 
At 12:13 PM, Blogger Tim said...

Your riight -SOA is a dream frought with a lot of gottcha'a

 
At 11:51 PM, Blogger John M. Worthington said...

SOA and IT Service Management go hand-in-hand...cross-silo monitoring provides the opportunity to get people on the same page, and is a good 'first step' towards implementing IT service management best practice.

Getting silos to work together often means getting rid of the "3-card monty" routine between IT tribes when it comes to end-to-end service performance. At that point, you can get your stakeholders together and begin working as a team.

Stakeholder Analysis includes fellow IT team members in addition to your customers!

Providing each silo with a common, service oriented view of the infrastructure --- and enabling them to quickly see the impacts thier silo has on other silos in real time --- is the fast track to shifting paradigms from silos to services.

John M. Worthington
Principal
MyServiceMonitor, LLC

 
At 6:04 PM, Anonymous Douglas W Stevenson said...

The n-Tier architectures you see today are the result of evolution. When what they needed to do, could not be done on a single host per app, they put in tiers of Servers to layer out the tasks.

The danger in n-Tier architectures is that the farther you go through tiers with your data / information, the more convoluted the data may become. Such that it may eliminate n-tier architectures as a two way information flow.

SOA is for all intentioned purposes, a programmers dream of treating everything like a web transaction. However, not every transaction is suitable for Web front ends. For example, if I have a process that scrolls through a series of log files looking for a` specific pattern match, and this information takes a significant amount of time, it creates havoc in the Web arena. End users think the process quit working so they hit the query over and over until the resources are massively consumed.
For SOAP to work, it needs to be able to close its envelope. Not possible in certain scenarios.
And while folks tout SOA and WSDLs going through UDDI, many implementations circumvent UDDI altogether (by hardcoding URLs) to effectively render the portability useless.
SOA, in its current forms, tends to exhibit scaling issues because of dependence upon other technologies like J2EE, EJB, containers, and even things like JBoss.

When you start reading the TMN specs like 508 and 613, you start to see a sort of vision towards a CMDB conceptual implementation. If you look closely at the TMN standards, they explicitly delineate the whats, hows and why fors of Object modelling of an IT infrastructure. Of the CMDB implementations I've seen, most would do well to follow the structures already delineated in the TMN specs.

However, in real life, it does not work. You have data owners, users, and subscribers. In the CMDB implementation done by BMC ever going to be a BETTER data source of config information from a router than the router itself?

In fact, application designers / Architects of Infrastructures are passing over the obvious fact that a huge portion of the CMDB is ALREADY THERE! The sad part is that in the current recommended CIM schemas, it mandates that a majority of the correlation work be accomplished at the TOP of the n-tier architecture. Now, all of a sudden, a database has to do all of the work associated with NMS and correlation. (Somebody who didn't understand what Correlation really means came up with this crap!)

All in all, SOA is another flash in the pan... Destined to repeat the history of CORBA. We didn't learn the lessons of CORBA so we are destined to repeat history, like Ground Hog day!

FWIW, I believe in the promise of EDA Event Driven Architectures and Grid technology. Having previously worked at a company that had Grid technology (that worked!) internal to its infrastructure, I see that the two rules of thumb I had to develop toward are not used outside of that environment -- Rule 1. Your application must scale. Rules #2 - Everything must run 24 by Forever. And a quote from my old friend, Sean Egan - If you can get your application to scale, you can buy performance! And for the Paul Harvey -> a few years back with MSN was down for 4 days because of DNS and DNS architecture issues, there was 1 place on the Internet where you could still get to MSN. Via an AOL client! When you cache the Internet to be a good Net Citizen, you get little benefits that go along with that. Somebody @ AOL was thinking on their feet.

 
At 4:42 PM, Anonymous MarkS said...

It's all about visibility and control. They are without a doubt two of the biggest problems we face. Monitoring and a CMDB are critical to operating an efficient infrastructure suite be it n-tier or SOA.. or GRID for that matter.
Not having your configuration data up-to-date and in one place is a severly limiting decision that affect service levels, DR capability and the ability to improve overall utilization of ones assets.

On the otherhand, using your end users for monitoring will get you fired. We need to stop doing exploratory surgery and get to the MRI version of monitoring. 5 mouse clicks to the point of pain. Oh, and don't forget, it can't take an army to keep it up to date. With 80%+ of the data center being commodity based,it should be that hard.

Making progress but carefully.

 

Post a Comment

<< Home