Title Image

Owner’s Corner: Cleaning Up The Logs

We produce tons of log files that are absolutely unruly. I've attached a sample of the logs, which is only 4 days worth: sample.log How can you make any sense of that? Fortunately, we've developed a system for cleaning those up semi-automatically.

Regular Expressions (regex) are basically methods of testing string (text) patterns, with variables. You can read all about them here. For a long time, I'd actually download the log files, then one-by-one run "replace this regular expression with nothing" for each pattern inside notepad++ or gedit. How many patterns do we use in our logs? Here's the document that was our guide for a long time with all the recurring unnecessary patterns:

Spoiler Inside SelectShow

That worked, but was quite frankly tedious. It's why it didn't get done for a long time; it took an hour or so per session. So I decided to try to automate the process with a script, and thus-far it's worked quite well. Let's call this version a "work in progress" version, since I haven't fully sorted out the edit-stop management and integration with the final stored logs.

It primarily uses the linux command sed with stdout redirection. The command "cat" prints a file, then sed will look through that print, do replacements via regex, then writes the edited output to a temporary file. Once ALL regex replaces have been made, it edits out the blank lines and pushes the changed log to the end of the existing "full server log."

The meat of this, though, is the regex commands we used. sed regex is different from Notepad++ regex, so I had to heavily modify the syntax for each regex match. Here's the script as it exists today:

**AS I UPDATE THE SCRIPT I'LL UPDATE THIS POST!

Spoiler Inside SelectShow

The end result is a nice, pretty, clean log file which we can store: sample.clean.log I still need to replace the code in the backup script that directs the logs to the full file to call this instead, and then I need to comment this monster because it's unruly.