We tried Fluent Bit + Lua to scrape Docker logs into OpenObserve, hit schema and parsing pain, and pivoted to emitting OpenTelemetry directly.

Why we started

We wanted centralized, structured logs for our scheduler heartbeats and application events in OpenObserve. Fluent Bit looked like a quick, lightweight choice to tail Docker logs, enrich them via Lua, and forward them over HTTP.

The setup

  • Stack: Docker Compose (backend, Fluent Bit, OpenObserve).
  • Fluent Bit: Tail Docker logs, run a Lua filter (tag.lua) to strip ANSI codes, extract JSON (heartbeat payload), set log_event, and forward to OpenObserve.
  • Goal: Query OpenObserve for log_event = 'ingestion_scheduler_heartbeat' with full JSON in log_message.

The reality: time spent vs. value gained

This was supposed to be hours. It turned into days of iteration because:

  • ANSI-laden NestJS logs: Needed ANSI stripping before JSON parse.
  • Nested JSON extraction: Heartbeat JSON was embedded after Nest prefixes. We kept rewriting Lua patterns to find the last JSON object and decode it.
  • Schema lag in OpenObserve: Even after emitting log_event, the UI didn’t show the field until we manually added it in the stream schema.
  • Container restarts & bind mounts: Fluent Bit only loaded tag.lua at startup; restarts sometimes reused old state. We resorted to --force-recreate and diffing /fluent-bit/etc/tag.lua.
  • Debugging loops: Enabling Lua debug logging flooded logs and even got re-ingested. We added guards, env-driven flags, and filtered out [tag.lua] lines, but it was noisy.
  • Tooling limits inside the image: No shell utilities; even printenv was missing. Debugging env required reading /proc/1/environ.

The turning point: why we pivoted

After multiple rounds of Lua fixes, restarts, and schema tweaks, the effort-to-reward ratio was poor. We realized:

  • We were spending most time on plumbing (parsing, restarts, schema alignment) rather than on the product.
  • OpenObserve already supports OpenTelemetry ingestion, which can carry structured logs without tail-parsing.
  • Eliminating the tail/parse hop removes an entire class of failure modes (ANSI stripping, regex fragility, schema drift).

The decision: remove Fluent Bit, go direct with OpenTelemetry

We dropped Fluent Bit and switched to emitting logs directly via OpenTelemetry (OTLP) to OpenObserve. Benefits:

  • No log scraping/parsing: We emit structured payloads as-is.
  • No ANSI issues: We send clean, structured data directly from the app.
  • Predictable schema: log_event and other fields arrive with the payload; OpenObserve sees them without Lua gymnastics.
  • Simpler ops: One fewer container to manage; no custom Lua; no bind-mount drift.

Lessons learned

  • Prefer structured emission over downstream parsing. If you control the app, emit structured logs once; avoid regex/Lua filters.
  • Schema awareness matters. OpenObserve won’t show new fields until they’re added or ingested—plan for schema registration.
  • Container restarts aren’t always enough. For config bind-mount changes, use --force-recreate to guarantee reload.
  • Debugging needs hygiene. Make debug logs opt-in, filter self-logs, and avoid re-ingesting them.
  • Use the native ingestion path when available. OTLP cut our integration time dramatically compared to tail/parse.

What we’d do next time

  1. Start with OpenTelemetry direct to OpenObserve.
  2. Keep logging strictly structured in the app layer.
  3. Minimize sidecar parsing; reserve Fluent Bit for edge cases where we don’t control the emitter.

FAQ

When should I use Fluent Bit vs direct OpenTelemetry logging?
Use Fluent Bit when you need to collect logs from sources you can't modify (system logs, third-party containers). Use direct OpenTelemetry when you control the application code and want structured logs without parsing complexity.
Was the problem with Fluent Bit or the approach?
Fluent Bit itself is solid, but tail-and-parse Docker logs introduced complexity: ANSI codes, nested JSON extraction, schema alignment, and restart state issues. The approach was the problem, not the tool.
Does OpenTelemetry work with all logging systems?
Most modern observability platforms support OpenTelemetry ingestion. Check if your backend accepts OTLP (OpenTelemetry Protocol) over HTTP/GRPC. OpenObserve, Datadog, New Relic, and many others have native support.
What about log volume and performance?
OpenTelemetry batching and compression make it efficient for high-volume logging. Direct emission also avoids the CPU overhead of log parsing in Fluent Bit, which matters at scale.

Welcome to The infinite monkey theorem

Somewhere a monkey just typed Shakespeare in TypeScript. Be the first to read the masterpieces (and the hilarious misfires) landing on the blog.

Subscribe to The infinite monkey theorem

We fling fresh posts—no banana peels attached—straight to your inbox.