Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs

简介

本文讲述了日志解析的重要性，即将原始日志消息转换为结构化格式，以便于大规模软件系统的自动分析。传统的日志解析器通常依赖于启发式或手工特征，这些特征可能无法在不同的日志来源之间泛化，或需要大量的模型调整。最近，一些日志解析器利用了大型语言模型（LLM）的强大生成能力。然而，它们很大程度上依赖于演示样例，导致LLM调用的开销很大。为了解决这些问题，我们提出了LogBatcher，这是一种基于LLM的成本效益高的日志解析器，不需要训练过程或标记数据。为了利用日志数据的潜在特征并减少开销，我们通过聚类将日志分成几个分区。然后，我们执行缓存匹配过程，将日志与先前解析的日志模板匹配。最后，我们通过对每个分区的一组日志进行批处理，为LLM提供更好的专门用于日志解析的提示上下文。我们在16个公共日志数据集上进行了实验，结果表明LogBatcher对于日志解析是有效和高效的。
图表
解决问题

Log parsing is an important step for automated analysis of logs of large-scale software systems, but traditional log parsers rely on heuristics or handcrafted features that may not generalize well. Recently, some log parsers have utilized powerful generative capabilities of large language models (LLMs), but they heavily rely on demonstration examples, resulting in substantial overhead in LLM invocations. This paper proposes a cost-effective LLM-based log parser that requires no training process or labeled data.
关键思路

The proposed LogBatcher divides logs into partitions through clustering, performs a cache matching process to match logs with previously parsed log templates, and provides LLMs with better prompt context specialized for log parsing by batching a group of logs from each partition. This reduces overhead and leverages latent characteristics of log data.
其它亮点

Experiments on 16 public log datasets show that LogBatcher is effective and efficient for log parsing. The paper also discusses the limitations of the proposed method and suggests future work, such as exploring the use of multiple LLMs and improving the cache matching process. The code is available on GitHub.
相关研究

Related work includes previous log parsing methods based on rule-based approaches, pattern-based approaches, and machine learning approaches. Some recent papers have also explored the use of LLMs for log parsing, but they require labeled data or a training process.

Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs

评论