Logs

Overview

Logs provide detailed records of events in your applications and infrastructure. The Observability Bundle includes Grafana Loki, a horizontally scalable log aggregation system that makes it easy to explore and analyze logs from all your Kubernetes clusters. Loki is inspired by Prometheus but designed specifically for logs. Instead of indexing log content (which is expensive), Loki indexes only metadata labels, making it cost-effective to store and query large volumes of logs.

Grafana Loki

Loki is a log aggregation system that collects, stores, and indexes logs from your Kubernetes clusters. It integrates seamlessly with Grafana for log visualization and querying.

Key Features

Cost-Effective Storage:

Indexes metadata labels, not log content
Significantly reduces storage and indexing costs compared to traditional log systems
Compresses log data efficiently

Label-Based Querying:

Uses the same label-based approach as Prometheus
Fast queries using metadata indexes
No need for expensive full-text indexes

Native Kubernetes Integration:

Automatically discovers pods and services
Extracts Kubernetes metadata as labels (namespace, pod, container)
Works with standard stdout/stderr container logs

Grafana Integration:

Explore logs directly in Grafana dashboards
Correlate logs with metrics and traces
LogQL query language similar to PromQL

How Loki Works

Container Logs (stdout/stderr)
    │
    └─> Promtail/OTel Collector ──────> Grafana Loki
            (collects logs)               (stores logs)
                                              │
                                              └──> Grafana
                                                 (visualizes logs)

Log Collection:

Containers write logs to stdout/stderr
Kubernetes stores logs on nodes
Promtail or OpenTelemetry Collector reads log files
Labels are extracted (namespace, pod, container, etc.)
Logs are sent to Loki for storage

Log Querying:

User queries logs in Grafana using LogQL
Loki uses label indexes to find relevant log streams
Loki filters and returns matching log lines
Grafana displays results

LogQL Query Language

LogQL (Log Query Language) is used to query logs in Loki. It’s similar to PromQL but designed for log data.

Basic Queries

Select logs by labels:

{namespace="production"}

{app="api-server", env="prod"}

{namespace="production", pod=~"api-.*"}

Filter log content:

# Contains text
{namespace="production"} |= "error"

# Doesn't contain text
{namespace="production"} != "debug"

# Regex match
{app="api"} |~ "error|ERROR|Error"

# Case-insensitive match
{app="api"} |~ "(?i)error"

Advanced Queries

Parse structured logs:

# JSON logs
{app="api"} | json | status_code >= 400

# Logfmt (key=value format)
{app="api"} | logfmt | level="error"

# Pattern matching
{app="api"} | pattern `<method> <path> <status>`

Aggregations:

# Count logs
count_over_time({namespace="production"}[5m])

# Rate of logs
rate({app="api"} |= "error" [5m])

# Sum a parsed field
sum by (status_code) (
  rate({app="api"} | json | __error__="" [5m])
)

Multi-line queries:

# Errors with high status codes
sum by (pod) (
  count_over_time(
    {namespace="production"}
      | json
      | status_code >= 500 [5m]
  )
)

Configuration

Loki is configured through the Observability Bundle’s GitOps workflow.

Basic Configuration

enabled: true
valuesObject:
  # Storage backend
  storage:
    bucketNames:
      chunks: loki-chunks
      ruler: loki-ruler
    type: s3
    s3:
      endpoint: s3.amazonaws.com
      region: us-east-1

  # Retention period
  limits_config:
    retention_period: 30d

  # Ingestion rate limits
  ingester:
    chunk_idle_period: 1h
    max_chunk_age: 2h

Multi-Tenancy

For multi-cluster deployments, configure tenants to separate logs:

valuesObject:
  auth_enabled: true
  distributor:
    ring:
      kvstore:
        store: memberlist

Then send logs with an X-Scope-OrgID header identifying the cluster/tenant. For detailed configuration options, see the Loki Helm chart documentation.

Viewing Logs in Grafana

Accessing Logs

Navigate to Grafana (access URL configured during Observability Bundle setup)
Go to Explore in the left sidebar
Select Loki as the data source
Use the query builder or write LogQL queries

Common Use Cases

View recent logs for a pod:

Select labels: namespace=production, pod=api-server-xyz
Add time range (e.g., last 15 minutes)
Optionally filter by keyword (e.g., “error”)

Search for errors across all services:

{namespace="production"} |= "error" |= "ERROR" |= "Error"

View logs around a specific time:

Use time picker to select date/time
Adjust time range to ±5 minutes around the event
Filter by relevant labels

Live Tail

Grafana supports live tailing of logs (similar to kubectl logs -f):

In Explore view, click the Live button
Logs will stream in real-time as they’re collected
Use filters to narrow down which logs to tail

Log Collection

The Observability Bundle supports multiple log collection methods:

OpenTelemetry Collector (Recommended)

The OTel Collector can collect logs and forward them to Loki:

# OTel Collector configuration
receivers:
  filelog:
    include:
      - /var/log/pods/*/*/*.log
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'

exporters:
  loki:
    endpoint: http://loki-gateway/loki/api/v1/push

Benefits:

Single agent for metrics, traces, and logs
Unified configuration and management
Consistent labeling across telemetry types

Promtail (Loki Native)

Promtail is Loki’s purpose-built log collector:

enabled: true
valuesObject:
  config:
    clients:
      - url: http://loki-gateway/loki/api/v1/push
    positions:
      filename: /tmp/positions.yaml
    scrape_configs:
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
          - role: pod

Benefits:

Optimized for Loki
Lower resource usage
Native Kubernetes integration

Structuring Logs for Observability

Log Levels

Use consistent log levels across applications:

DEBUG: Detailed information for diagnosing issues
INFO: General informational messages
WARN: Warning messages for potentially harmful situations
ERROR: Error messages for failure scenarios
FATAL: Critical errors that cause application shutdown

Structured Logging

Use structured logging (JSON) rather than plain text: Good - JSON structured logs:

{
  "level": "error",
  "timestamp": "2025-01-16T10:30:00Z",
  "message": "Database connection failed",
  "error": "connection timeout",
  "service": "api-server",
  "trace_id": "abc123"
}

Bad - Unstructured text:

[ERROR] 2025-01-16 10:30:00 - Database connection failed: connection timeout

Structured logs enable:

Filtering by fields (| json | level="error")
Aggregations (sum by (status_code))
Correlation with traces (trace_id)

Include Context

Add context to log messages:

User ID, request ID, trace ID
Operation being performed
Input parameters (sanitized)
Error codes or types

Correlating Logs with Traces

When using distributed tracing, include trace IDs in logs for correlation:

// Application code
logger.info("Processing request", {
  trace_id: span.context().traceId,
  user_id: request.userId,
  operation: "create_order"
});

In Grafana:

View a trace in Tempo
Click “Logs for this trace”
Grafana automatically queries Loki for logs with matching trace ID
See all logs related to that specific request

Troubleshooting

No logs appearing in Loki

Check log collection agent:

# If using Promtail
kubectl logs -n observability -l app=promtail

# If using OTel Collector
kubectl logs -n observability -l app=opentelemetry-collector

Verify Loki is running:

kubectl get pods -n observability | grep loki

Check Loki ingester logs:

kubectl logs -n observability -l app=loki -c ingester

Common issues:

Network connectivity between collector and Loki
Loki ingestion rate limits hit
Incorrect labels causing streams to be dropped

High memory usage in Loki

Loki memory usage scales with:

Number of active streams (unique label combinations)
Ingestion rate (logs per second)
Query load

Solutions:

Reduce label cardinality (avoid high-cardinality labels like request IDs)
Decrease retention period
Increase chunk idle period to batch more data before flushing
Add more Loki ingesters for horizontal scaling

Check stream cardinality:

# Access Loki metrics
kubectl port-forward -n observability svc/loki-gateway 3100:80
curl http://localhost:3100/metrics | grep loki_ingester_streams

Slow log queries

Optimize queries:

Use specific labels to narrow search space
Avoid querying very long time ranges
Use aggregations instead of returning raw logs when possible

Good query:

{namespace="production", app="api"} |= "error" [1h]

Bad query (too broad):

{namespace="production"} [24h]  # Searches all apps for 24 hours

Check query performance:

Grafana shows query execution time
Loki query stats show chunks scanned

Solutions:

Add more specific labels
Reduce time range
Use log aggregation rules for common queries

Missing logs for specific pods

Check pod labels:

kubectl get pods -n my-namespace --show-labels

Verify log collector is scraping pod:

# Check Promtail targets
kubectl port-forward -n observability svc/promtail 3101:3101
curl http://localhost:3101/targets

Common issues:

Pod doesn’t match log collector’s label selectors
Pod is in a namespace not being monitored
Logs are being written to files instead of stdout/stderr

Best Practices

Use Structured Logging

Always use JSON or structured log formats. This enables filtering, parsing, and aggregation in LogQL queries.

Control Label Cardinality

Keep the number of unique label combinations low. Avoid labels with values like request IDs, timestamps, or user IDs.

Include Trace IDs

Add trace IDs to logs for correlation with distributed traces. This enables powerful debugging workflows.

Set Appropriate Retention

Balance storage costs with retention needs. Typical: 7-30 days for logs, longer for critical systems.

Next Steps

View Metrics - Learn about metrics collection with Prometheus and Mimir
Configure Tracing - Set up distributed tracing with Tempo
Observability Overview - Return to bundle overview

Introduction

Getting Started

Application Features

Infrastructure & Platform

Observability

Settings

AI Assistant

Terms and Privacy Policy

Overview

Grafana Loki

Key Features

How Loki Works

LogQL Query Language

Basic Queries

Advanced Queries

Configuration

Basic Configuration

Multi-Tenancy

Viewing Logs in Grafana

Accessing Logs

Common Use Cases

Live Tail

Log Collection

OpenTelemetry Collector (Recommended)

Promtail (Loki Native)

Structuring Logs for Observability

Log Levels

Structured Logging

Include Context

Correlating Logs with Traces

Troubleshooting

Best Practices

Use Structured Logging

Control Label Cardinality

Include Trace IDs

Set Appropriate Retention

Next Steps

Introduction

Getting Started

Application Features

Infrastructure & Platform

Observability

Settings

AI Assistant

Terms and Privacy Policy

​Overview

​Grafana Loki

​Key Features

​How Loki Works

​LogQL Query Language

​Basic Queries

​Advanced Queries

​Configuration

​Basic Configuration

​Multi-Tenancy

​Viewing Logs in Grafana

​Accessing Logs

​Common Use Cases

​Live Tail

​Log Collection

​OpenTelemetry Collector (Recommended)

​Promtail (Loki Native)

​Structuring Logs for Observability

​Log Levels

​Structured Logging

​Include Context

​Correlating Logs with Traces

​Troubleshooting

​Best Practices

Use Structured Logging

Control Label Cardinality

Include Trace IDs

Set Appropriate Retention

​Next Steps

Overview

Grafana Loki

Key Features

How Loki Works

LogQL Query Language

Basic Queries

Advanced Queries

Configuration

Basic Configuration

Multi-Tenancy

Viewing Logs in Grafana

Accessing Logs

Common Use Cases

Live Tail

Log Collection

OpenTelemetry Collector (Recommended)

Promtail (Loki Native)

Structuring Logs for Observability

Log Levels

Structured Logging

Include Context

Correlating Logs with Traces

Troubleshooting

Best Practices

Next Steps