(Senior) Cloud Infrastructure Engineer
ClickHouse
Other Engineering
Berlin, Germany · Munich, Germany
EUR 90k-160k / year + Equity
Location
Europe; Berlin; London; Munich; Paris; Zurich
Employment Type
Full time
Location Type
Hybrid
Department
Engineering
Compensation
- Salary €90K – €160K • Offers Equity
About Langfuse
Open Source LLM Engineering Platform that helps teams build useful AI applications via tracing, evaluation, and prompt management (mission, product). We are now part of ClickHouse.
We're building the "Datadog" of this category; model capabilities continue to improve, but building useful applications is really hard, both in startups and enterprises.
Largest open source solution in this category: trusted by 19 of the Fortune 50, >2k customers, >26M monthly SDK downloads, >6M Docker pulls.
We joined ClickHouse in January 2026 because LLM observability is fundamentally a data problem and Langfuse already ran on ClickHouse. Together we can move faster on product while staying true to open source and self-hosting, and join forces on GTM and sales to accelerate revenue.
Previously backed by Y Combinator, Lightspeed, and General Catalyst.
We're a small, engineering-heavy, and experienced team in Berlin and San Francisco. We are also hiring for engineering in EU timezones and expect one week per month in our Berlin office (how we work).
Why Cloud Infrastructure at Langfuse
Your work will keep Langfuse running — everywhere.
Langfuse processes over a billion trace events per month. When a Fortune 50 company relies on Langfuse in production, they're relying on the infrastructure you operate. You'll own uptime, performance, and cost efficiency across our entire cloud footprint — and you'll make sure every self-hosted deployment runs just as smoothly.
You'll operate Langfuse Cloud on AWS ECS Fargate and ClickHouse Cloud, with Datadog as the observability backbone. You'll also own our public self-hosted infrastructure — including our Helm chart, Docker Compose setup, and everything in between — so that teams from startups to enterprises can run Langfuse on their own terms.
This isn't a "maintain what exists" role. We're scaling fast, and you'll be the person who makes sure the infrastructure grows ahead of demand — not behind it.
Langfuse is now part of ClickHouse, which means the team behind the database at the core of our stack is one channel away. Few infrastructure roles give you that kind of direct access to the people who build your most critical dependency.
You will grow at Langfuse by
Own Langfuse Cloud operations: You'll run our production environments on AWS ECS Fargate and ClickHouse Cloud. You'll manage deployments, autoscaling, capacity planning, and cost optimization — making sure we stay fast and affordable as traffic scales.
Build world-class observability: You'll own our Datadog setup end to end — dashboards, alerts, and SLOs. When something degrades, you'll ensure we know before our customers do. You'll build the monitoring culture that lets the whole team ship with confidence.
Make self-hosting effortless: Thousands of teams run Langfuse on their own infrastructure. You'll own and evolve our Helm chart, Docker Compose configuration, and deployment documentation. You'll turn "works on my machine" into "works on every machine" — from a single-node setup to a multi-region enterprise deployment.
Automate everything: CI/CD pipelines, infrastructure-as-code, automated scaling, zero-downtime deployments. You'll replace manual processes with automation that makes the team faster and the platform more reliable.
Scale for what's next: We're growing fast and new product directions — like complex long-running agent observability and real-time evaluation — push the infrastructure in new ways. You'll be thinking ahead about what breaks at 10x scale and building the foundation before we get there. 10x is always just one quarter away here at Langfuse.
Harden security and compliance: As more enterprises adopt Langfuse, you'll help ensure our cloud and self-hosted deployments meet the security and compliance bar that large organizations require.
What we're looking for
Strong infrastructure or SRE engineer who gets excited about running systems at scale and making them better every day
Experience operating production workloads on AWS (ECS/Fargate, networking, IAM, S3, etc.) or on comparable hyperscale vendors.
Comfortable with container orchestration — Kubernetes and/or ECS, Helm charts, Docker
Experience with infrastructure-as-code (Terraform, Pulumi, CloudFormation, or similar)
Strong monitoring and observability instincts — you've built dashboards and alerts that actually caught problems (Datadog experience is a plus)
You organize yourself. You have strong opinions about reliability, automation, and how to ship infrastructure changes safely
Interest in open source software and genuine enjoyment helping users debug their self-hosted deployments
Thrives in a small, accountable team where your output is visible and matters
CS or quantitative degree preferred
Bonus points:
Experience with ClickHouse Cloud or other managed analytical databases
Background in operating high-throughput event processing or observability infrastructure
Contributions to open source infrastructure tooling (Helm charts, Terraform modules, etc.)
Former founder
Process
We can run the full process to your offer letter in less than 7 days (hiring process).
Tech Stack
We run a TypeScript monorepo: Next.js on the frontend, Express workers for background jobs, PostgreSQL for transactional data, ClickHouse for tracing at scale, S3 for file storage, and Redis for queues and caching. You should be familiar with a good chunk of this, but we trust you'll pick up the rest quickly (Stack, Architecture).
How we ship
We trust you to take ownership (ownership overview) for your area. You identify what to build, propose solutions (RFCs), and ship them. Everyone here thinks about the user experience and the technical implementation at the same time. Everyone manages their own Linear.
You're never alone. Anyone from the team is happy to go into a whiteboard session with you. 15 minutes of shared discussion can very much improve the overall output.
We implement maker schedule and communication. There are two recurring meetings a week: Monday check-in on priorities (15 min) and a demo session on Fridays (60 min).
Code reviews are mentorship. New joiners get all PRs reviewed to learn the codebase, patterns, and how the systems work (onboarding guide).
We use AI as much as possible in our workflows to make our users happy. We encourage everyone to experiment with new tooling and AI workflows.
Why Langfuse (now part of ClickHouse)
This role puts you at the forefront of the AI revolution, partnering with engineering teams who are building the technology that will define the next decade(s).
This is an open-source devtools company. We ship daily, talk to customers constantly, and fight for great DX. Reliability and performance are central requirements.
Your work ships under your name. You'll appear on changelog posts for the features you build, and during launch weeks, you'll produce videos to announce what you've shipped to the community. You’ll own the full delivery end to end.
We're solving hard engineering problems: figuring out which features actually help users improve AI product performance, building SDKs developers love, visualizing data-rich traces, rendering massive LLM prompts and completions efficiently in the UI, and processing terabytes of data per day through our ingestion pipeline.
You'll work closely with the ClickHouse team and learn how they build a world-class infrastructure company. We're in a period of strong growth: Langfuse is growing organically and accelerating through ClickHouse's GTM. (Why we joined ClickHouse)
If you wonder what to build next, our users are a Slack message or a Github discussions post away.
You’re on a continuous learning journey. The AI space develops at breakneck speed and our customers are at the forefront. We need to be ready to meet them where they are and deliver the tools they need just-in-time.
Compensation Range: €90K - €160K