Staff Data Engineer
Remote, United States
$200,000 to $250,000 base
AI-powered vertical data platform
We are working with a high-growth AI technology company building a data-rich platform for a large, relationship-driven industry undergoing rapid digital transformation.
The business has strong commercial traction, significant investor backing, and is investing heavily in AI product and engineering as it scales a platform used by thousands of professionals to manage digital growth, customer engagement, and operational workflows.
They are hiring a Staff Data Engineer to join a small, senior data platform team responsible for one of the company’s most important technical systems: ingesting, normalizing, and serving high-volume third-party data from hundreds of fragmented external sources.
This is a hands-on staff-level role for someone who enjoys hard data infrastructure problems, distributed systems, backend engineering, and the practical use of AI to improve operational workflows.
About the Role
You will help lead the next phase of a large-scale data platform that powers customer-facing products, internal automation, search and discovery features, recommendations, and AI-enabled workflows.
The core platform is already live, but there is significant work ahead around scalability, reliability, cost optimization, data quality, observability, and automation.
The team works across streaming and batch data pipelines, backend services, orchestration, data lake infrastructure, and AI agents that support internal teams by reducing manual investigation, improving issue triage, and accelerating data onboarding.
This is not a narrow data pipeline role. You will be expected to shape architecture, work directly with product and engineering stakeholders, and stay close to implementation.
What You’ll Do
Own and evolve large-scale data pipelines that ingest and normalize high-volume data from hundreds of external feeds.
Design and improve event-driven data flows using Kafka or similar messaging technologies.
Build and operate backend services and APIs that expose core data to internal systems and customer-facing products.
Work with Spark or Flink across batch and streaming workloads.
Improve orchestration, reliability, monitoring, alerting, and data quality across production data systems.
Partner with AI and ML engineers on agentic workflows that automate operational triage, data onboarding, and internal support processes.
Help define evaluation, logging, telemetry, and feedback loops for AI-enabled systems.
Drive technical design discussions and make practical tradeoffs around cost, performance, reliability, and delivery speed.
Mentor other engineers while remaining deeply hands-on.
Tech Stack & Environment
The environment includes Python and Java backend services, Kafka, Spark, Airflow, Kubernetes, AWS, EMR, data lake infrastructure, SQL, and modern observability tooling.
The company is also working with AI agent frameworks and LLM-powered workflows to automate internal processes and improve data operations. Experience with tools such as LangChain, LangGraph, PydanticAI, Claude Code, or similar frameworks is useful, but strong engineering fundamentals matter most.
What We’re Looking For
Strong hands-on experience with data-intensive or distributed systems at scale.
Deep production experience with Spark, Flink, or similar large-scale processing frameworks.
Experience with Kafka, Kinesis, Pub/Sub, or another event streaming platform.
Backend engineering experience in Python, Java, or a similar language.
Experience with Airflow or comparable workflow orchestration tools.
Strong SQL, data modeling, and data quality fundamentals.
Comfort working with Kubernetes and cloud infrastructure.
Ability to operate at staff level while still owning implementation details.
Strong communication skills and the ability to work closely with product, engineering, operations, and support teams.
A practical interest in AI, including how AI tools and agents can improve engineering workflows and operational efficiency.
Nice to Have
Experience with Iceberg, Hive, Athena, Redshift, Snowflake, or similar data lake and warehouse technologies.
Experience with data observability or quality tools.
Experience with vector databases, retrieval systems, or AI agent monitoring.
Background working with fragmented third-party data, marketplace data, search, recommendations, advertising, or other high-volume customer-facing data products.
Startup or high-growth company experience.
Why This Role Is Interesting
You will join a small team with ownership over a very large technical surface area. The platform handles complex, messy, high-value third-party data across hundreds of sources, and the work has a direct impact on customer-facing products and internal AI automation.
The company is pushing AI into real operational workflows, not treating it as a side experiment. This role offers the chance to work on practical AI automation, high-scale data infrastructure, and backend systems in one position.
It is a strong fit for someone who wants staff-level scope without losing the ability to build.
About People In AI
People In AI partners with high-growth AI and technology companies to help them hire specialist technical talent. We work closely with hiring teams to understand the role, the technical environment, and the expectations before speaking with candidates, so conversations are focused, transparent, and useful from the start.