Senior Distributed Systems Engineer – AI Inference
$200K–$260K base + equity
Bay Area, CA (on-site only)
Next-gen infrastructure for large models
A well-funded startup is rethinking how we run today’s largest machine learning models at scale. Their focus: building highly efficient systems for transformer inference, designed from the hardware up. With a team of experienced systems engineers, ML practitioners, and hardware experts, they’re creating infrastructure purpose-built for running massive AI workloads faster and more reliably than current approaches allow.
Now they’re hiring a hands-on systems engineer to help scale the software that powers this next-gen compute platform. The work is low-latency, performance-critical, and squarely at the intersection of machine learning and distributed systems.
What You’ll Do
- Design and implement strategies for running transformer and MoE models across multi-node compute clusters
- Write performance-optimized code in Python and C++, interfacing with ML frameworks like JAX or PyTorch
- Collaborate closely with platform and hardware engineers to coordinate model execution across custom infrastructure
- Troubleshoot complex system interactions, optimize data flows, and contribute to orchestration tooling
- Own software performance end-to-end, from modeling to deployment
What They’re Looking For
- Deep experience building distributed systems at scale
- Strong understanding of how modern ML models (LLMs, MoEs) operate in production
- Comfort working with orchestration tools (Kubernetes, Slurm, or similar)
- Proficiency in C++ and Python, and familiarity with model-serving architectures
- A systems mindset: performance tuning, memory efficiency, and reliability are second nature
Tech Stack & Environment
- Languages: C++, Python
- Frameworks: PyTorch, JAX (or equivalent)
- Infra: Container orchestration, custom hardware, internal runtime environments
- Culture: Deeply technical, fast-moving, collaborative, on-site
Why Join
This is a rare opportunity to help shape how large-scale AI inference gets done—not by tweaking existing tools, but by building new ones from scratch. If you're excited by low-level performance, cutting-edge model architectures, and collaborating closely with hardware teams, this is a place to have real impact.
About People In AI
We’re a boutique recruiting firm focused exclusively on top-tier AI and machine learning talent. We work directly with engineering leadership at frontier tech companies to connect them with candidates who want meaningful, technically challenging roles. When we reach out, it’s because we think it’s a genuinely strong fit.