Image
img
img

Senior MLOps / ML Platform Engineer

  • Permanent
  • $300,000 - $500,000 total comp
  • United States, California, San Francisco
  • Data Infrastructure & MLOps
Image

Senior MLOps / ML Platform Engineer

Location: Remote (U.S.) | Preference for SF Bay Area
Type: Full-time, Permanent
Salary Range: $180,000 – $250,000 + Equity + Benefits


About the Opportunity

People in AI is working with a confidential, late-stage startup that’s scaling one of the most advanced ML platforms in production. This company operates at enormous scale, supporting trillions of real-time and batch interactions across their data infrastructure — and they’re hiring experienced engineers to help build the backbone of their machine learning practice.

You’ll join a high-impact ML Platform team that owns the infrastructure used by 20+ ML Engineers and Data Scientists — enabling faster experimentation, deployment, and monitoring of models in production.


What You’ll Work On

  • Design, build, and operate ML infrastructure for training, deployment, and inference
  • Scale and manage feature stores powering real-time and batch use cases
  • Develop high-throughput pipelines using Ray, Apache Spark, and Kafka
  • Improve latency and reliability of ML model serving (GPU + CPU)
  • Work with tools like MLFlow, Argo, Terraform, Kubernetes (EKS)
  • Build internal tooling and automation to improve ML developer workflows
  • Collaborate closely with cross-functional ML teams to enable experimentation at scale

Ideal Background

  • 5+ years in MLOps, ML Platform Engineering, Data Engineering, or Infrastructure
  • Strong experience with Apache Spark, Spark Structured Streaming, Kafka, Ray, or similar tools
  • Proven experience building or scaling feature stores (e.g. Tecton, Feast)
  • Deep understanding of online vs offline inference, and how to optimize for both
  • Hands-on experience with Kubernetes (EKS), Terraform, and cloud-native infra (AWS preferred)
  • Background in software engineering, with a strong focus on production-grade systems
  • Bonus: experience managing GPU compute environments or working with CI/CD for ML workflows

Tech Stack Highlights

  • Infra: Kubernetes (EKS), Terraform, Helm, Istio, CloudFlare
  • Pipelines: Spark, Ray, Kafka, Airflow
  • Languages: Python, Java, Scala
  • Serving & Orchestration: MLFlow, Argo Workflows, ArgoCD
  • Monitoring: Datadog, Prometheus
  • Modeling tools: HuggingFace 🤗, PyTorch, TensorFlow, Metaflow

Why Apply

  • Join at a pivotal time — huge ownership and technical influence
  • Work on systems used by hundreds of millions of users
  • Competitive compensation + strong equity upside
  • Remote flexibility + preference for Bay Area engineers for in-person collaboration
Share job:
Decor
Image

Upload resume

Boost your career with expert recruitment solutions!

Your resume will be confidentially submitted to our team, who will be in touch if we have a match for your job search

Upload resume
Image
Image