Who is Altan Khendup?

A professional technologist that dabbles in innovative and interesting uses of technology, Mongolian history, philosophy and cooking ethnic foods.

Often described as part philosopher, scholar, technologist, and mentor Altan likes engaging in stimulating conversations with professionals, tackling problems in a hands-on and collaborative manner with technology, and enjoying the company of good friends and family.

 

My Twitter Stream

Entries in PeopleSoft (6)

Tuesday
Jan182011

Everyday Innovation - Using Hadoop with PeopleSoft Part 1

During my professional career it is very interesting to see how organizations face challenges to their businesses. For many projects I find myself typically involved from the technical viewpoint since in today's marketplace, technology plays a very important role. Often times many organizations are unable to effectively find a way to bridge the gap in delivering new and modern offerings with their legacy assets. This is mostly due to the fact that there are fundamental differences in how the two groups often view problems: legacy is often heavily grounded in risk-adverse, tried-and-true approaches whereas the other groups tend to be more progressive. 

On all of the projects I work on, I apply a mixture of experience and open-mindedness, approaching issues from multiple points of view for the benefit of the organization and it's customers. One such endeavor that I have used across multiple companies has been applying Hadoop to legacy applications such as Oracle/PeopleSoft. Hadoop is an excellent platform for data-intensive distributed computing and of the many legacy applications large organizations operate products such as Oracle/PeopleSoft are quite common.

Oracle/PeopleSoft produces a lot of very interesting data not just in the form of the standard metadata and transactional data, but also the operational data elements such as the logs that can be found in the various tiers. The most obvious challenges to any organization with such data happen to be that they are in diverse formats, located on different servers, take up a lot of space, are hard to work with, etc. However the benefits of this data are many and can include such viewpoints such as capacity planning, holistic event correlation, business activity analysis, geolocation analysis, and many others. 

In almost every organization, analyzing business data in legacy Oracle/PeopleSoft consists of SQL-oriented operations from production copies. While this is fine for the database itself, often times the equally valuable logging and operational data is often left behind due to space constraints. This is where Hadoop comes in.

In the beginning of my endeavors with Hadoop I used the Apache open source version. However as time has progressed, legacy organizations became less comfortable with such implementations. Thankfully organizations such as Cloudera have emerged who not only provide excellent support for Hadoop, but who have also created bundled implementations that can be more easily introduced into companies via support and training programs. 

In this first post, I will be demonstrating the power of Hadoop with simple examples based on the Cloudera distribution and Oracle/PeopleSoft CRM 9.

The Hadoop deployment is a development instance running 4 nodes distributed across 4 different data centers across the continental United States. In this particular case, the nodes in the cluster have a total capacity of some 300GB. This is a very small cluster with the nodes being 4 CPUs, 4GB RAM and only 200 GB each. Obviously not all the disk space in the nodes are allocated to Hadoop in this example, but that can be easily adjusted based on the resources available in your organization for the cluster.

I will be not going to go into the full details of installing and configuring Hadoop as this process can be found in great detail both at Cloudera's website and the Apache Hadoop website depending on your choice of implementation.

After installing, configuring and starting your Hadoop cluster please insure that you have access to the commands as can be seen below...

Hadoop command line interfaceThere are numerous commands that can be executed. If one has a systems administration background, many of these are somewhat familiar with obvious differences due to the distributed nature of Hadoop.

Each of the commands in turn have additional layers of assistance. For example, one of the more common operations involves the Hadoop filesystem...

Hadoop FS commandThe FS command is a useful command to implement as it can be used in scripts to deal with basic data elements on the file system. For example let's take one of the example Hadoop Java programs ...

Example Java program on the file systemAs one can see from this brief display, it is a straightforward example of raw Java code.

One of the more simple capabilities of Hadoop is to take any element from the file system, such as this Java program, and through the command line operations, interact with them via Hadoop.

In the following example, I will place the Java program onto the Hadoop distributed file system (HDFS) via the FS put command ...

Placing the Java program onto HDFSAs can be seen in this example, the HDFS listing indicates that the program called sample.java has been moved into Hadoop.

While there are additional command line parameters that one can use to verify this operation, another way to look at the operation is via the web interface that comes with the Cloudera distribution.

Cloudera Hadoop Distributed File System Web InterfaceAs can be seen, the HDFS web interface provides some basic information about the Hadoop cluster such as when it was started, the version, and the capacity of the cluster.

The web interface also provides a way to navigate the file system via the link named "Browse the filesystem". In this case, I followed the links until I reached the directory level that resembles my command line...

Hadoop File System ListingFrom this perspective I can see the program I placed onto the HDFS via the FS command which was called sample.java. Selecting that link from this listing provides me more details about the actual item in the directory...

Java code on the HDFSAs can be seen, the web interface actually displays the contents of the file. It also provides some simple commands such as "download the file", "tail the file", etc. Also from this perspective I can see that the example Java program fits into a single Hadoop data block which has successfully replicated to all 4 nodes across my network. From that single Hadoop FS put, the data I placed into the HDFS has successfully moved across all 4 data centers and is now consumable by Hadoop jobs all in a manner of a minute or two.

This capability can be used as a simple backup to make data easily replicated and distributed across an organization's infrastructure. However that is by no means the only capability that Hadoop brings to the table.

In my next post, I will demonstrate how by building on this simple concept, how Hadoop can be used to obtain information used to answer various questions.

 

Saturday
Mar062010

Maintaining PeopleSoft - Periodic Cleanliness Helps

Actually this applies to any large system whether it be custom or vendor purchased. Generally speaking everyone knows that to keep things manageable they have to perform period maintenance; cleaning out disk space, defragmenting memory, changing parameters for better resource management, etc. Not every organization agrees on when maintenance should occur nor how much should be applied. The usage patterns and business need are the primary factors. Costs and risks involved to running production instances are other factors that have to be considered as well.

There are two major dimensions to any maintenance: the items to be maintained and the timing of the maintenance. A PeopleSoft system has quite a few things to consider:

  • Logs. There are many logs ranging from the web server to the application server. All of these logs need to be spooled off or cleaned from running production instances periodically based on how quickly they grow. Simple scripts can handle the work quite well.
  • Dumps. These happen. They are very useful at times to analyze problems but when the issues have been resolved these need to be moved from production as quickly as possible to save space. If dumps are not frequent nor of significant size, then they may not be as pressing.
  • Temporary files. This happens from time to time as well. It is important to remove any temporary files from reports, XML, errors, or whatever other process is creating them as soon as they are not needed. Also be judicious about the use. They do take up space and processing power that could be better spent in the system.
  • Domains. Every so often domains need to be brought down and recycled. While it is true a PeopleSoft application can go on quite a while without bringing them down, the domains will actually degrade over time due to use. Ideally the production architecture has several domains in place with capacity to spare. Cycling through domains at planned intervals is the best way to continue business operations while refreshing domain resources.
  • Database. Whether it be Oracle, DB2 or MS Sql Server the databases need their own maintenance as well. Usually adjusting the various parameters for growth and defragmentation. Since PeopleSoft stores both meta-data and application data in the same database just in different structures, the database tends to be the most important area of maintenance. It is also the most frustrating for many companies since the PeopleSoft schema is not necessarily managed by the native database in the sense of being what could be considered compliant. For example the lack of many things such as procedures, triggers, referential integrity, etc. tend to drive DBAs insane. This is because the nature of a PeopleSoft system is to have the application servers perform all that logic. Collaborative efforts between PeopleSoft experts and the DBAs is essential to a healthy production instance.

The timing of maintenance will have been made available via the architecture such as multiple domains and clustered databases coupled with targeting low activity periods within the system. In the end caution around production applications will drive when and how maintenance activities need to occur.

Finding the right amount of maintenance will be based on the amount of activity and growth the system bears over time. High degrees of instability are the most common signs that maintenance activities need to be reviewed in terms of comprehensiveness and timing which can lead to unwanted service issues or even outages. So plan, implement, review and repeat as often as you can. A well maintained production system can last years with no downtime.

Friday
Feb262010

PeopleSoft Logs - Finding Hidden Nuggets

In almost all shops that I go to with problems related to performance or business operations due to technical issues, log files are an important and valuable source of data and information. Once one gets past the basic questions such as "What are the symptoms?" or "What kind of application are you running?" many details start to emerge. One of the biggest challenges to any technical analysis happens to be for large production applications that are running either a) their own frameworks and solutions and b) running other's frameworks and solutions. Usually group A has lots of internal experts that while often times extremely brilliant have very little expertise in what happens to their solutions once it starts to run in production after a while. The benefit of a group A is that they are all close by, knowledgeable and often times very open minded in finding a solution. Group B happens to the more classic kind who run technologies not of their own creation. They could be running PeopleSoft, Oracle, SAP, IBM, or even some of their own internal solutions mixed with any or all other types of solutions. These complex environments have both internal experts and external experts in the form of the vendors themselves to call upon.

In complex system interactions there are many places for useful information to hide. One of these places are the little gems known as log files. Pretty much every application and system under the sun produces them and they are often captured for analysis. Some of the more advanced shops have their own tools again either by their own development or acquired that parses the logs and presents information. Others are a little less sophisticated. However log files are often an excellent way to look at events at various layers of an system to determine whether or not there are problems.

Often times log files are hard to read. This is especially true for those produced by applications from vendors whose formats and data vary wildly from one to another. A PeopleSoft application is no exception. There are several different logs produced that one has to consider for a variety of situations:

  • Web Log. This is the typical web server whether it be WebSphere, etc. It holds the core information for all web interactions which include both standard users and automated web interactions.
  • PeopleSoft Web Log. These are maintained on the PeopleSoft side usually kept in the webserv directory. It contains the stderr, the stdout and also the application gateway logs which contain web activity for a PeopleSoft application.
  • PeopleSoft middleware logs. These are a collection of logs used to describe what is occurring within the layers of the PeopleSoft application. They include:
    • Application Server Logs - Used to describe what is occurring within the application servers.
    • Tuxedo Logs - Used to describe what the transaction servers are actually doing from a resource and request level.
    • Dump Files - Usually only occurring when something strange has happened, these files can appear usually in the application server logs for extreme system events.
    • REN (Realtime Event Notification) Server Logs - Similar to Application server logs but for REN events.
    • Process Scheduler Logs - A small collection of logs for each part of the process scheduler that describes what is happening at each level.
  • PeopleSoft Application Data. These are the tables and constructs such as the messaging constructs (PSAPMSG*) or the process scheduler (PSPRCS*) that have useful information that may or may not be inside the previously mentioned logs.

The first thing to notice is how many logs there are and then secondly how disperse they are. They are spread across an entire infrastructure. Another point to consider is that often times some hint of a problem can be determined within these logs, but only if you set the logging levels to something meaningful. I am not talking about high levels of detail, but even summary information has it's limits. Factoring in constant and perpetual logging in a running production environment is typically seen as a "No-No" in terms of performance. However, logging takes little overhead if set properly and background processes that periodically "clean and archive" the data for processing can minimize the disk worries. Typically the overhead for a PeopleSoft environment is usually anywhere from 5-10%. This has to be weighed in terms of the time it takes to identify a production issue.

In my experience being proactive means having a good logging level, a means of capturing all that information, analyzing it, and presenting it such that you have in addition to all of your existing tools a holistic and historical view of what your application is doing relative to the business transactions.

Some issues such as web-based transactions using the integration gateway happen to be one kind of problem that requires traversing several different tiers. The reason is simple: all the information about the layers of the integrations are not stored in the same place. Each layer has a different story to tell and can be of great help in determining problems. Based on this I used the following process to look at integration issues:

  • Is the integration synchronous or asynchronous? They have different structures in the database in terms of what to look for.
    • Synchronous are typically not logged unless one sets that up in the PeopleSoft web screens (called PIA). Setting these to at least a "Log Header" will store the information in the database in the PSIBLOGHDR and PSPUBCON. Setting it to "Log Detail" is a lot more overhead and typically unnecessary unless you want to look at the contents of a message itself.
    • Asynchronous are usually logged. They are stored in different structures such as the PSAPMSGSUBCON.
    • Errors for either if the PeopleSoft developers have put in error handling will appear in the PSAPMSGPUBERR or PSAPMSGSUBERR. Overall errors will be placed in the PSIBLOGHDR.
  • Next the actual logs may contain information. Usually integration transactions run in their own application servers separate from standard transactions. Why? Because putting them both in a single domain usually overloads the application servers in all but the smallest of shops. Even if they are in the same domain, the logs will still contain additional information about their events.
  • Tuxedo will be able to present information on the messaging as well in terms of requests processed, requests currently waiting and the workload being processed. This is informative in the historical context against business transactions to see if there are any load issues occurring.
  • The gateway itself can produce information about transactions. There may be errors connecting to a service, there may be formatting issues, there may be even Java memory issues for a particular request. These can be seen in the PSIGW for any PeopleSoft integrations.
  • The web server. In the case that there is insufficient memory resources for Java or other issues, these will appear in these logs.

Most of the time getting the basics is pretty easy if you have all the access, scripts, SQL, and information readily at hand. Typically this is not the case. I find that many shops have different people covering different areas, each area can have different requirements, these requirements may or may not have the appropriate level of data needed to make a determination. All of this is time. If in a dev or QA function, time may be flexible. In production outage situations time is not on your side.

Having spent a lot of time helping companies solve production issues for their mission-critical applications the processes are not really all that different from one application to the next. Typically the challenge is finding the correlated data, piecing it together quickly and efficiently, and having the proper automated tools in place that can quickly answer a question. If people are being used for all issues, the length of time to find a problem is very high and usually not proactive enough to address production problems. Many implementations fail to properly consider the amount of on-going development used to meet the on-going SLA commitments for their projects.

Good luck on your own implementations!