Overview
Logs provide detailed records of events in your applications and infrastructure. The Observability Bundle includes Grafana Loki, a horizontally scalable log aggregation system that makes it easy to explore and analyze logs from all your Kubernetes clusters. Loki is inspired by Prometheus but designed specifically for logs. Instead of indexing log content (which is expensive), Loki indexes only metadata labels, making it cost-effective to store and query large volumes of logs.Grafana Loki
Loki is a log aggregation system that collects, stores, and indexes logs from your Kubernetes clusters. It integrates seamlessly with Grafana for log visualization and querying.Key Features
Cost-Effective Storage:- Indexes metadata labels, not log content
- Significantly reduces storage and indexing costs compared to traditional log systems
- Compresses log data efficiently
- Uses the same label-based approach as Prometheus
- Fast queries using metadata indexes
- No need for expensive full-text indexes
- Automatically discovers pods and services
- Extracts Kubernetes metadata as labels (namespace, pod, container)
- Works with standard stdout/stderr container logs
- Explore logs directly in Grafana dashboards
- Correlate logs with metrics and traces
- LogQL query language similar to PromQL
How Loki Works
- Containers write logs to stdout/stderr
- Kubernetes stores logs on nodes
- Promtail or OpenTelemetry Collector reads log files
- Labels are extracted (namespace, pod, container, etc.)
- Logs are sent to Loki for storage
- User queries logs in Grafana using LogQL
- Loki uses label indexes to find relevant log streams
- Loki filters and returns matching log lines
- Grafana displays results
LogQL Query Language
LogQL (Log Query Language) is used to query logs in Loki. It’s similar to PromQL but designed for log data.Basic Queries
Select logs by labels:Advanced Queries
Parse structured logs:Configuration
Loki is configured through the Observability Bundle’s GitOps workflow.Basic Configuration
Multi-Tenancy
For multi-cluster deployments, configure tenants to separate logs:X-Scope-OrgID header identifying the cluster/tenant.
For detailed configuration options, see the Loki Helm chart documentation.
Viewing Logs in Grafana
Accessing Logs
- Navigate to Grafana (access URL configured during Observability Bundle setup)
- Go to Explore in the left sidebar
- Select Loki as the data source
- Use the query builder or write LogQL queries
Common Use Cases
View recent logs for a pod:- Select labels:
namespace=production,pod=api-server-xyz - Add time range (e.g., last 15 minutes)
- Optionally filter by keyword (e.g., “error”)
- Use time picker to select date/time
- Adjust time range to ±5 minutes around the event
- Filter by relevant labels
Live Tail
Grafana supports live tailing of logs (similar tokubectl logs -f):
- In Explore view, click the Live button
- Logs will stream in real-time as they’re collected
- Use filters to narrow down which logs to tail
Log Collection
The Observability Bundle supports multiple log collection methods:OpenTelemetry Collector (Recommended)
The OTel Collector can collect logs and forward them to Loki:- Single agent for metrics, traces, and logs
- Unified configuration and management
- Consistent labeling across telemetry types
Promtail (Loki Native)
Promtail is Loki’s purpose-built log collector:- Optimized for Loki
- Lower resource usage
- Native Kubernetes integration
Structuring Logs for Observability
Log Levels
Use consistent log levels across applications:- DEBUG: Detailed information for diagnosing issues
- INFO: General informational messages
- WARN: Warning messages for potentially harmful situations
- ERROR: Error messages for failure scenarios
- FATAL: Critical errors that cause application shutdown
Structured Logging
Use structured logging (JSON) rather than plain text: Good - JSON structured logs:- Filtering by fields (
| json | level="error") - Aggregations (
sum by (status_code)) - Correlation with traces (
trace_id)
Include Context
Add context to log messages:- User ID, request ID, trace ID
- Operation being performed
- Input parameters (sanitized)
- Error codes or types
Correlating Logs with Traces
When using distributed tracing, include trace IDs in logs for correlation:- View a trace in Tempo
- Click “Logs for this trace”
- Grafana automatically queries Loki for logs with matching trace ID
- See all logs related to that specific request
Troubleshooting
No logs appearing in Loki
No logs appearing in Loki
Check log collection agent:Verify Loki is running:Check Loki ingester logs:Common issues:
- Network connectivity between collector and Loki
- Loki ingestion rate limits hit
- Incorrect labels causing streams to be dropped
High memory usage in Loki
High memory usage in Loki
Loki memory usage scales with:
- Number of active streams (unique label combinations)
- Ingestion rate (logs per second)
- Query load
- Reduce label cardinality (avoid high-cardinality labels like request IDs)
- Decrease retention period
- Increase chunk idle period to batch more data before flushing
- Add more Loki ingesters for horizontal scaling
Slow log queries
Slow log queries
Optimize queries:Bad query (too broad):Check query performance:
- Use specific labels to narrow search space
- Avoid querying very long time ranges
- Use aggregations instead of returning raw logs when possible
- Grafana shows query execution time
- Loki query stats show chunks scanned
- Add more specific labels
- Reduce time range
- Use log aggregation rules for common queries
Missing logs for specific pods
Missing logs for specific pods
Check pod labels:Verify log collector is scraping pod:Common issues:
- Pod doesn’t match log collector’s label selectors
- Pod is in a namespace not being monitored
- Logs are being written to files instead of stdout/stderr
Best Practices
Use Structured Logging
Always use JSON or structured log formats. This enables filtering, parsing, and aggregation in LogQL queries.
Control Label Cardinality
Keep the number of unique label combinations low. Avoid labels with values like request IDs, timestamps, or user IDs.
Include Trace IDs
Add trace IDs to logs for correlation with distributed traces. This enables powerful debugging workflows.
Set Appropriate Retention
Balance storage costs with retention needs. Typical: 7-30 days for logs, longer for critical systems.
Next Steps
- View Metrics - Learn about metrics collection with Prometheus and Mimir
- Configure Tracing - Set up distributed tracing with Tempo
- Observability Overview - Return to bundle overview
