Who is Altan Khendup?

A professional technologist that dabbles in innovative and interesting uses of technology, Mongolian history, philosophy and cooking ethnic foods.

Often described as part philosopher, scholar, technologist, and mentor Altan likes engaging in stimulating conversations with professionals, tackling problems in a hands-on and collaborative manner with technology, and enjoying the company of good friends and family.

 

My Twitter Stream

Entries in Log Files (1)

Friday
Feb262010

PeopleSoft Logs - Finding Hidden Nuggets

In almost all shops that I go to with problems related to performance or business operations due to technical issues, log files are an important and valuable source of data and information. Once one gets past the basic questions such as "What are the symptoms?" or "What kind of application are you running?" many details start to emerge. One of the biggest challenges to any technical analysis happens to be for large production applications that are running either a) their own frameworks and solutions and b) running other's frameworks and solutions. Usually group A has lots of internal experts that while often times extremely brilliant have very little expertise in what happens to their solutions once it starts to run in production after a while. The benefit of a group A is that they are all close by, knowledgeable and often times very open minded in finding a solution. Group B happens to the more classic kind who run technologies not of their own creation. They could be running PeopleSoft, Oracle, SAP, IBM, or even some of their own internal solutions mixed with any or all other types of solutions. These complex environments have both internal experts and external experts in the form of the vendors themselves to call upon.

In complex system interactions there are many places for useful information to hide. One of these places are the little gems known as log files. Pretty much every application and system under the sun produces them and they are often captured for analysis. Some of the more advanced shops have their own tools again either by their own development or acquired that parses the logs and presents information. Others are a little less sophisticated. However log files are often an excellent way to look at events at various layers of an system to determine whether or not there are problems.

Often times log files are hard to read. This is especially true for those produced by applications from vendors whose formats and data vary wildly from one to another. A PeopleSoft application is no exception. There are several different logs produced that one has to consider for a variety of situations:

  • Web Log. This is the typical web server whether it be WebSphere, etc. It holds the core information for all web interactions which include both standard users and automated web interactions.
  • PeopleSoft Web Log. These are maintained on the PeopleSoft side usually kept in the webserv directory. It contains the stderr, the stdout and also the application gateway logs which contain web activity for a PeopleSoft application.
  • PeopleSoft middleware logs. These are a collection of logs used to describe what is occurring within the layers of the PeopleSoft application. They include:
    • Application Server Logs - Used to describe what is occurring within the application servers.
    • Tuxedo Logs - Used to describe what the transaction servers are actually doing from a resource and request level.
    • Dump Files - Usually only occurring when something strange has happened, these files can appear usually in the application server logs for extreme system events.
    • REN (Realtime Event Notification) Server Logs - Similar to Application server logs but for REN events.
    • Process Scheduler Logs - A small collection of logs for each part of the process scheduler that describes what is happening at each level.
  • PeopleSoft Application Data. These are the tables and constructs such as the messaging constructs (PSAPMSG*) or the process scheduler (PSPRCS*) that have useful information that may or may not be inside the previously mentioned logs.

The first thing to notice is how many logs there are and then secondly how disperse they are. They are spread across an entire infrastructure. Another point to consider is that often times some hint of a problem can be determined within these logs, but only if you set the logging levels to something meaningful. I am not talking about high levels of detail, but even summary information has it's limits. Factoring in constant and perpetual logging in a running production environment is typically seen as a "No-No" in terms of performance. However, logging takes little overhead if set properly and background processes that periodically "clean and archive" the data for processing can minimize the disk worries. Typically the overhead for a PeopleSoft environment is usually anywhere from 5-10%. This has to be weighed in terms of the time it takes to identify a production issue.

In my experience being proactive means having a good logging level, a means of capturing all that information, analyzing it, and presenting it such that you have in addition to all of your existing tools a holistic and historical view of what your application is doing relative to the business transactions.

Some issues such as web-based transactions using the integration gateway happen to be one kind of problem that requires traversing several different tiers. The reason is simple: all the information about the layers of the integrations are not stored in the same place. Each layer has a different story to tell and can be of great help in determining problems. Based on this I used the following process to look at integration issues:

  • Is the integration synchronous or asynchronous? They have different structures in the database in terms of what to look for.
    • Synchronous are typically not logged unless one sets that up in the PeopleSoft web screens (called PIA). Setting these to at least a "Log Header" will store the information in the database in the PSIBLOGHDR and PSPUBCON. Setting it to "Log Detail" is a lot more overhead and typically unnecessary unless you want to look at the contents of a message itself.
    • Asynchronous are usually logged. They are stored in different structures such as the PSAPMSGSUBCON.
    • Errors for either if the PeopleSoft developers have put in error handling will appear in the PSAPMSGPUBERR or PSAPMSGSUBERR. Overall errors will be placed in the PSIBLOGHDR.
  • Next the actual logs may contain information. Usually integration transactions run in their own application servers separate from standard transactions. Why? Because putting them both in a single domain usually overloads the application servers in all but the smallest of shops. Even if they are in the same domain, the logs will still contain additional information about their events.
  • Tuxedo will be able to present information on the messaging as well in terms of requests processed, requests currently waiting and the workload being processed. This is informative in the historical context against business transactions to see if there are any load issues occurring.
  • The gateway itself can produce information about transactions. There may be errors connecting to a service, there may be formatting issues, there may be even Java memory issues for a particular request. These can be seen in the PSIGW for any PeopleSoft integrations.
  • The web server. In the case that there is insufficient memory resources for Java or other issues, these will appear in these logs.

Most of the time getting the basics is pretty easy if you have all the access, scripts, SQL, and information readily at hand. Typically this is not the case. I find that many shops have different people covering different areas, each area can have different requirements, these requirements may or may not have the appropriate level of data needed to make a determination. All of this is time. If in a dev or QA function, time may be flexible. In production outage situations time is not on your side.

Having spent a lot of time helping companies solve production issues for their mission-critical applications the processes are not really all that different from one application to the next. Typically the challenge is finding the correlated data, piecing it together quickly and efficiently, and having the proper automated tools in place that can quickly answer a question. If people are being used for all issues, the length of time to find a problem is very high and usually not proactive enough to address production problems. Many implementations fail to properly consider the amount of on-going development used to meet the on-going SLA commitments for their projects.

Good luck on your own implementations!