Skip to main content

Command Palette

Search for a command to run...

Parse Complex Logs with LogListener Composite Parsing Pipelines

A practical guide to delimiter, JSON, drop, metadata, and custom processor chains for structured log ingestion

Updated
6 min read
Parse Complex Logs with LogListener Composite Parsing Pipelines

When log formats grow with the business, a single parser is rarely enough. A line may start with a timestamp, contain a delimiter-separated request segment, embed JSON in the middle, and end with fields that should never be indexed. LogListener composite parsing solves that problem at the collection layer by running processors in order before the event is written into CLS.

The pattern is simple: split a raw record into named fields, apply the right processor to each field, then emit only the structured fields that are useful for search, analytics, and alerting.

In practice, the pipeline works as a staged processor tree:

Stage What happens Typical output
1. First split The raw log line is split into coarse fields such as time, msg1, msg2, or msg3. Named intermediate fields.
2. Field-level processing Each intermediate field is handled by the processor that matches its format: delimiter, JSON, key-value, regular expression, metadata, time format, or drop. Cleaned and expanded fields.
3. Final emission Temporary fields, markers, and unused fragments are removed. A compact structured event ready for search, SQL analysis, alerting, and dashboards.

When composite parsing is the right tool

Problem Composite parsing approach
One log line contains multiple formats Run a first-stage parser, then parse selected fields again with JSON, delimiter, key-value, or regular expression processors.
Some parsed fields should not be reported Use processor_drop to remove unneeded fields before ingestion.
Metadata or path information should become searchable Use metadata extraction and a second-stage processor to turn filename or path parts into tags.
A full line needs several transformations Chain processors under a parent processor so each stage runs in sequence.

The processor catalog is easiest to treat as a small toolbox rather than a single parser choice:

Processor family Use it for
Delimiter split Break a line or field into ordered segments with a stable separator.
JSON parsing Expand a JSON object embedded inside a field into separate key-value fields.
Key-value split Turn key:value or similar pairs into structured fields.
Full regular expression Extract fields from irregular or path-like content when delimiter parsing is not enough.
Time format Convert timestamp text or epoch values into a normalized event time.
Metadata processor Read collector-side metadata such as filename and turn it into tags or fields.
Drop processor Remove temporary, empty, or unneeded fields before the record is reported.

Example 1: drop fields after parsing

If a raw event contains three key-value pairs but only key2 matters, drop key1 and key3 before reporting the event.

key1:value1
key2:value2
key3:value3
{
  "processors": [
    {
      "type": "processor_drop",
      "detail": {
        "Sourcekey": ["key1", "key3"]
      }
    }
  ]
}

The resulting event keeps only the useful field:

key2:value2

Example 2: add metadata from a file path

Composite parsing can also enrich a log record with metadata. In this pattern, the file path contains the application, version, log directory, and log name. The pipeline first splits the content by comma, then extracts the filename metadata and applies a regular expression to create tags.

value1,value2
path: /usr/local/loglistener-2.7.4/testdir/test.log
{
  "processors": [
    {
      "type": "processor_split_delimiter",
      "detail": {
        "Delimiter": ",",
        "ExtractKeys": ["msg1", "msg2"]
      }
    },
    {
      "type": "meta_processor",
      "detail": {
        "ExtractKeys": ["FILENAME"]
      },
      "processors": [
        {
          "type": "processor_fullregex",
          "detail": {
            "KeepSource": false,
            "SourceKey": "FILENAME",
            "ExtractRegex": "/\\w+/\\w+/(\\w+)-([^/]+)/(\\w+)/(\\w+). *",
            "ExtractKeys": ["app", "ver", "logdir", "logname"]
          }
        }
      ]
    }
  ]
}
msg1:value1
msg2:value2
__TAG__.app: loglistener
__TAG__.ver: 2.7.4
__TAG__.logname: test
__TAG__.logdir: testdir

The metadata processor requires LogListener 2.7.4 or later in this scenario.

Example 3: split, convert time, split again, and parse key-value content

A custom processor chain can expand nested content and keep the result searchable.

1571394459,http://127.0.0.1/my/course/4|10.135.46.111|200,status:DEAD,
{
  "processors": [
    {
      "type": "processor_split_delimiter",
      "detail": {
        "Delimiter": ",",
        "ExtractKeys": ["time", "msg1", "msg2"]
      },
      "processors": [
        {
          "type": "processor_timeformat",
          "detail": {
            "KeepSource": true,
            "TimeFormat": "%s",
            "SourceKey": "time"
          }
        },
        {
          "type": "processor_split_delimiter",
          "detail": {
            "KeepSource": false,
            "Delimiter": "|",
            "SourceKey": "msg1",
            "ExtractKeys": ["submsg1", "submsg2", "submsg3"]
          }
        },
        {
          "type": "processor_split_key_value",
          "detail": {
            "KeepSource": false,
            "Delimiter": ":",
            "SourceKey": "msg2"
          }
        }
      ]
    }
  ]
}
time: 1571394459
submsg1: http://127.0.0.1/my/course/4
submsg2: 10.135.46.111
submsg3: 200
status: DEAD

Example 4: parse a real nested access log

The more realistic case uses / as the first delimiter. The timestamp is kept as time, the JSON payload is expanded, and the log_start, log_end, and empty tail fields are dropped.

2016-01-02 12:59:59/log_start/{"remote_ip":"10.135.46.111","body_sent":23,"responsetime":0.232,"upstreamtime":"0.232","upstreamhost":"unix:/tmp/php-cgi.sock","http_host":"127.0.0.1","method":"POST","url":"/event/dispatch","request":"POST /event/dispatch HTTP/1.1","xff":"-","referer":"http://127.0.0.1/my/course/4","agent":"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0","response_code":"200"}/log_end/
{
  "processors": [
    {
      "type": "processor_split_delimiter",
      "detail": {
        "KeepSource": false,
        "Delimiter": "/",
        "ExtractKeys": ["time", "msg2", "msg3", "msg4", "msg5"]
      },
      "processors": [
        {
          "type": "processor_drop",
          "detail": { "SourceKey": "msg2" }
        },
        {
          "type": "processor_json",
          "detail": {
            "KeepSource": false,
            "SourceKey": "msg3"
          }
        },
        {
          "type": "processor_drop",
          "detail": { "SourceKey": "msg4" }
        },
        {
          "type": "processor_drop",
          "detail": { "SourceKey": "msg5" }
        }
      ]
    }
  ]
}

The final structured output contains searchable request attributes such as remote_ip, method, url, response_code, responsetime, upstreamhost, upstreamtime, referer, agent, and xff.

Operational checklist

  • Keep the first split deterministic. The first-stage delimiter decides which nested content each processor can see.
  • Use KeepSource: false when the raw intermediate field should not remain in the event.
  • Drop placeholder fields as early as possible, especially start markers, end markers, and empty tails.
  • Convert image-only parser tables into configuration notes or code blocks so future search can find the processor names.
  • Test the final record against the fields your dashboards, alert rules, and SQL queries expect.

FAQ

Can one LogListener pipeline combine delimiter parsing and JSON parsing?

Yes. A parent delimiter parser can split the line into named fields, and a child JSON processor can expand only the field that contains JSON.

Why drop fields at the collector instead of filtering later?

Dropping fields at collection time reduces event size and avoids indexing fields that are not useful for operations, analytics, or audit workflows.