Who is Altan Khendup?

A professional technologist that dabbles in innovative and interesting uses of technology, Mongolian history, philosophy and cooking ethnic foods.

Often described as part philosopher, scholar, technologist, and mentor Altan likes engaging in stimulating conversations with professionals, tackling problems in a hands-on and collaborative manner with technology, and enjoying the company of good friends and family.

 

My Twitter Stream

Entries in Data Access (2)

Sunday
Jul182010

Data As A Service - A Practical Viewpoint

In a recent discussion the topic of clouds arouse. Now many in the industry are still attempting to come to grips with the fundamentals of this concept. In this specific conversation we traversed down the thoughts of public vs. private, externally housed vs. internally housed, cloud-bursting, etc. Eventually we landed on the topic of services such as Infrastructure-as-a-Service, Software-as-a-Service, Platforms-as-a-Service and Data-as-a-Service.

We focused on Data-as-a-Service mostly because the professionals I were speaking to had issues around data and were curious as to how Data-as-a-Service or DaaS could help address many concerns traditional companies now face.

If one looks at the internal arrangement of any company that has been around for a bit, their data ecology is a mix of islands within the corporate environment: RDBMS like Oracle and SQL Server in a variety and number, file-based data such as Excel spreadsheets and the like, electronic communication such as email, etc. On these islands lay valuable pieces of data, keys, details about accounts, customers, strategic initiatives, etc. Historically speaking obtaining valuable information from these data islands spread throughout the ecology has been tremendously painful and labor-intensive requiring an organization to place significant investments into things such as data warehousing. While effective on a number of levels, in today's age of lightning-fast changes being able to get to valuable information that not only resides within but about the data ecology is absolutely essential to survival.

This is where the concepts of services particularly around DaaS are very powerful. From a high-level the DaaS allows an organization to not only have access to required information, but also places powerful discovery and self-evolving mechanisms that had not previously existed into the hands of the organization. There are a few key concepts that help make the DaaS work:

  • Storage: This can be reasonably small to very large. In truth DaaS need not truly focus on this as storage is usually addressed as Infrastructure-as-a-Service os IaaS. However practically speaking it is just a matter of understanding whether the goal of DaaS is to hook into an existing IaaS or act as a mesh over existing islands.
  • Meta-Data Dictionary: Everyone in IT and development knows what a data dictionary is. And equally many business people care less. However the idea here is to evolve from isolated or even an enterprise data dictionary into more of a Meta-Data Dictionary. The reason? Data dictionaries are what I view as instance-specific ways of making a definition, placing things into that definition, and interacting with that definition. Practically speaking it is how say one gets data within an Oracle database, or within an Excel spreadsheet, etc. When traversing the larger ecology they are tremendously ineffective, unable to handle rapidly changing contexts across extended domains. Meta-Data Dictionaries serve this purpose. They tap into the local dictionaries and extend them to include a much larger array of context across the organization to provide answers more rapidly while saving time and effort.
  • Meaning: The next major component of DaaS is meaning. It is something that is not as commonly spoken about. Meaning in the context of DaaS is the ability to consistently present not only information about what resides in the ecology but also about the ecology with regard to the domain of the organization. In a traditional relational database for example, issuing a query does not take into account the fundamental about meaning. The results of a single query does not mean anything by itself to the organization until it is placed within a broader context that not only spans the targeted island, but all the desired islands within the ecology. As the context grows so does the ever changing complexity of establishing meaning. Working in tandem with a Meta-Data Dictionary as opposed to individual data dictionaries, meaning can be quickly determined as a question is posed throughout the data ecology.
  • Discovery: This takes the paradigm of search and applies it throughout the DaaS ecology. Whenever events happen within the DaaS that affect the meaning as interpreted via the Meta-Data Dictionary, discovery adjusts to that by not only making older patterns available but newer ones as well. In this manner an organization is capable of discovering evolving patterns within their ecology as it relates to their business.
  • Living Data: Now this is old hat to many in the internet crowd but fairly new to organizations especially since it arises with the mention of DaaS. This concept means that all data elements as they change with respect to the data ecology are available immediately upon a user request. Practically speaking it appears to a consumer that data changes it's behavior as they interact with it. Examples include Twitter updates, Google Finance chart navigation or online banking activity from any bank. These instances not only process events, but as the events have impact on the ecology they are reflected back to the requester within seconds.
  • Services: A fundamental aspect of any "as-a-Service" model is the concept of services. The type, manner, number of, and management of services in a DaaS should not be underestimated. Services need to be simple, powerful and flexible to meet the needs of the organization. For a typical DaaS because of it's Meta-Dictionary it also has services related to Meaning and Discovery that provide far more value than traditional access services such as query.

I have had the opportunity and privilege to work on such a platform in my career. Having a reasonably strong background in databases I can say the transition was not an easy one. All the newer dimensions require significantly more consideration and realization than a simple database perspective. For example with a DaaS one can see the changing patterns of behavior and ask questions and gain insights into complex questions not really addressable before. In one of my previous experiences working at a large telecom the question arose about how long would it take for data elements of a particular marketing campaign to reach all the necessary parts of the organization. As with most typical organizations, the answer was not really precise since it was a culmination of asking each division and then aggregating the responses. In many cases the divisions were not 100% certain of the timing themselves. With my primitive platform in place, we were able to look up the information in a few minutes and provide a more comfortable, provable answer to the organization in a rapid manner. The cost savings along were well worth it; 2 mins of a single individuals time vs. 30mins for 1200 people of varying levels. Other questions such as how the ecology handles volumes, what volumes mean in relation to business operations, the amount and volume of meaning inconsistencies and what savings could be achieved are just some of the more typical operational questions. However with a DaaS in place, higher value insights can be gained such as missed opportunities for new products/services based on customer activity, competitive standing based on social responses and replies with regards to existing products/services, capacity planning for bursting or planned progessions, and many others.

It was at this point my colleagues were thinking it would take them years to build out a DaaS. I responded that a DaaS does take effort, but not necessarily time. It is an equal mix of the deep technology which would be a blend of building it and using vendor tools, and the expertise and knowledge of technical and busines staff. From a tools perspective, solutions such as those provided by vendors such as QuePlix for data virtualization and Kapow for integration to leverage existing domains can quickly get an organization with significant existing assets to DaaS basics very rapidly. The core characteristic is the commitment from the organization. Any undertaking such as DaaS is something fundamental to the culture not just a dalliance.

Then I pointed out the shifting landscape of competitive pressure due to the economic crisis. Those with stronger, valuable, flexible and more timely interactions with their data ecologies are the ones that typically engage their customers more meaningfully. Whereas those with less capabilities quickly find themselves losing opportunities to competitors. From a career standpoint many of the new technologies related to the DaaS such as cloud concepts, big data, distributed data, and the like are some of the most in-demand skills not just hands-on, but in management, deployment, architecture, etc. As more and more companies realize the value of DaaS along with other strategic approaches, they are moving to embrace them in order to stay competitive and survive.

Tuesday
Feb162010

Database Access - A Tale of Viewing Data

Whether it is MySQL, Oracle, DB2, PostGres, Microsoft SQL server, or any other relational database engine, one of the most common misconceptions happens to be it's access. Having gone through many generations and numerous applications based on relational databases there are strong elements to consider based on a variety of factors:

  • Growth. This is based on three dimensions: past usage, predicted usage, and actual usage. All three of these dimensions need to be collected, retained and actually compared on a regular basis. Preferably in an automated fashion. This is usually left to operations to "deal with" however the analysis is very important and goes beyond operational staff to analysts who should actually be mapping their predictions to actuals, especially with regard to the accuracy of forecasts. If you are wildly off base then that is just as important as being right on target.
  • Stability. This is a real simple thing often times well misunderstood. This is the adherence to an SLA. However there are several important elements that should always be considered:
    • All user applications have an operational SLA of 100% availability. The application should never, ever, ever, ever go down in terms of it's operational aspect to users. Most companies I have seen have this set to 100% or 99.9999%.
    • Individual component SLAs can be whatever is appropriate so long as they do not negatively impact the end user SLA.
    • Tie SLA performance to compensation for the various groups: development, operations, planning, etc. Leaving it as a high priority to one group and not the others will create infighting due to the appearance of lackadaisical attitudes.
  • Innovation. Relational databases are one of the most well understood, stable, reliable and hence common store of data in most companies. However often times common data management practices are well behind the times in actually keeping up to date with innovative techniques that can help the business. Some of the more common examples I have implemented:
    • Search engines. This goes well beyond the common datamarts, rather this is the ability to mine the data for relevant information based on search techniques. Imagine being able to access the legacy data as easily as running a Google-like key word search. Most companies think of this as a "nice to have". In truth using the approach even with operational information such as dumps, logs and error handling is useful. It can speed up the operational response to issues, help users find key data without overly taxing the main data store, and it can even help address data aging and replication without compromising the core data set. Search engines can also help compress data requirements so that they become more manageable. Case in point: relationally speaking regional order data for a large telco  for 30 days to their national customers was about 2-4 terabytes a month with many reports often times taking a full 48 hrs to retrieve. Converting several dimensions of the data or raw portions of it to the open source Lucene search engines reduced the actual informational requirements down to about 1TB with access times in the milliseconds.
    • Data replication and abstraction via NoSQL. This idea is common but usually with relational engines. One proposal for any legacy company is to take a segment of their data and move it to NoSQL solutions such as Hadoop. This allows for better recoverability and accessibility of data in an inexpensive and agile manner, while still leveraging existing data stores for current business needs.
    • Enablement of web services (especially REST). Often times there are so many demands on the data store that at some point the argument always arrives at ownership vs. consumption. In truth it is a little less about ownership and a whole lot about maintainability. Many shops are ill-equipped to handle several dozen clusters of sharded data let alone the growing demands for what could be hundreds if not thousands of requests. The best answer is to allow a little sharding via SQL or NoSQL solutions, but with a mindset towards web services especially with RESTful interfaces. The idea should come as no surprise given the fact that one of the most scalable eco-systems happens to be Twitter. By providing basic interfaces on a REST approach, thousands of applications are able to be constructed. Throw in some good corporate governance, there is little reason data infrastructure should not be similarly positioned to provide the maximum flexibility for internal groups trying to provide key applications to the business. 
  • Data retention. The most commonly adopted mechanism for databases is what I call data retention. All groups ask how long they should keep "the data". The answer is always something akin to like 5years or so. In truth data should never really be discarded. There is little reason to given the litany of alternatives to store the data and the usefulness of keeping this data to better position the business. Take for example Google. They do not really get rid of data per se, as opposed to putting the most "hot" data on top and with the other "cold" data eventually getting buried. This enables them to create very rich analytics for new products, services and to make better overall strategic decisions. Every piece of data being "thrown away" is something being lost to the business. 
  • Speed of access. Data should be made available quickly and reliably. Often times this boils down to creating really fast access to the data. This is where things get ugly because in almost all cases, what data adminstrators want is less access while consumers want more. Web services can help mitigate the argument however replying quickly to data requests is absolutely paramount. Heck if people can get queries from FaceBook, Google or Twitter in milliseconds over the internet, intranet queries should not be taking minutes. Creating a rich, useful and scalable access approach lies in the data administrators domain and should be regarded with a much larger view than simply SQL or reporting queries.

Often times in many companies I see "new" initiatives lock horns with the "old" data wranglers. There is never a need to have these confrontations. Both can work really well together to deliver new and exciting options to any business. Hopefully just looking at the world's growing demand for immediate information and how internet companies are handling that issue will help foster better solutions for companies.