Database Access - A Tale of Viewing Data
Tuesday, February 16, 2010 at 12:03PM Whether it is MySQL, Oracle, DB2, PostGres, Microsoft SQL server, or any other relational database engine, one of the most common misconceptions happens to be it's access. Having gone through many generations and numerous applications based on relational databases there are strong elements to consider based on a variety of factors:
- Growth. This is based on three dimensions: past usage, predicted usage, and actual usage. All three of these dimensions need to be collected, retained and actually compared on a regular basis. Preferably in an automated fashion. This is usually left to operations to "deal with" however the analysis is very important and goes beyond operational staff to analysts who should actually be mapping their predictions to actuals, especially with regard to the accuracy of forecasts. If you are wildly off base then that is just as important as being right on target.
- Stability. This is a real simple thing often times well misunderstood. This is the adherence to an SLA. However there are several important elements that should always be considered:
- All user applications have an operational SLA of 100% availability. The application should never, ever, ever, ever go down in terms of it's operational aspect to users. Most companies I have seen have this set to 100% or 99.9999%.
- Individual component SLAs can be whatever is appropriate so long as they do not negatively impact the end user SLA.
- Tie SLA performance to compensation for the various groups: development, operations, planning, etc. Leaving it as a high priority to one group and not the others will create infighting due to the appearance of lackadaisical attitudes.
- Innovation. Relational databases are one of the most well understood, stable, reliable and hence common store of data in most companies. However often times common data management practices are well behind the times in actually keeping up to date with innovative techniques that can help the business. Some of the more common examples I have implemented:
- Search engines. This goes well beyond the common datamarts, rather this is the ability to mine the data for relevant information based on search techniques. Imagine being able to access the legacy data as easily as running a Google-like key word search. Most companies think of this as a "nice to have". In truth using the approach even with operational information such as dumps, logs and error handling is useful. It can speed up the operational response to issues, help users find key data without overly taxing the main data store, and it can even help address data aging and replication without compromising the core data set. Search engines can also help compress data requirements so that they become more manageable. Case in point: relationally speaking regional order data for a large telco for 30 days to their national customers was about 2-4 terabytes a month with many reports often times taking a full 48 hrs to retrieve. Converting several dimensions of the data or raw portions of it to the open source Lucene search engines reduced the actual informational requirements down to about 1TB with access times in the milliseconds.
- Data replication and abstraction via NoSQL. This idea is common but usually with relational engines. One proposal for any legacy company is to take a segment of their data and move it to NoSQL solutions such as Hadoop. This allows for better recoverability and accessibility of data in an inexpensive and agile manner, while still leveraging existing data stores for current business needs.
- Enablement of web services (especially REST). Often times there are so many demands on the data store that at some point the argument always arrives at ownership vs. consumption. In truth it is a little less about ownership and a whole lot about maintainability. Many shops are ill-equipped to handle several dozen clusters of sharded data let alone the growing demands for what could be hundreds if not thousands of requests. The best answer is to allow a little sharding via SQL or NoSQL solutions, but with a mindset towards web services especially with RESTful interfaces. The idea should come as no surprise given the fact that one of the most scalable eco-systems happens to be Twitter. By providing basic interfaces on a REST approach, thousands of applications are able to be constructed. Throw in some good corporate governance, there is little reason data infrastructure should not be similarly positioned to provide the maximum flexibility for internal groups trying to provide key applications to the business.
- Data retention. The most commonly adopted mechanism for databases is what I call data retention. All groups ask how long they should keep "the data". The answer is always something akin to like 5years or so. In truth data should never really be discarded. There is little reason to given the litany of alternatives to store the data and the usefulness of keeping this data to better position the business. Take for example Google. They do not really get rid of data per se, as opposed to putting the most "hot" data on top and with the other "cold" data eventually getting buried. This enables them to create very rich analytics for new products, services and to make better overall strategic decisions. Every piece of data being "thrown away" is something being lost to the business.
- Speed of access. Data should be made available quickly and reliably. Often times this boils down to creating really fast access to the data. This is where things get ugly because in almost all cases, what data adminstrators want is less access while consumers want more. Web services can help mitigate the argument however replying quickly to data requests is absolutely paramount. Heck if people can get queries from FaceBook, Google or Twitter in milliseconds over the internet, intranet queries should not be taking minutes. Creating a rich, useful and scalable access approach lies in the data administrators domain and should be regarded with a much larger view than simply SQL or reporting queries.
Often times in many companies I see "new" initiatives lock horns with the "old" data wranglers. There is never a need to have these confrontations. Both can work really well together to deliver new and exciting options to any business. Hopefully just looking at the world's growing demand for immediate information and how internet companies are handling that issue will help foster better solutions for companies.


Reader Comments