Sourcing campaignsSenior Machine Learning Evaluation Engineer (LLMs & Decision Quality)
NYC
$500,000 - $900,000 total comp
We’re working with a world-class, research-driven organization operating in a high-stakes decision-making environment to hire a Senior Machine Learning Evaluation Engineer.
This role sits at the intersection of AI, data, and judgment. The focus is not on building flashy demos or optimizing infrastructure, but on answering a harder question:
When should an AI system be trusted?
The role
You’ll be responsible for designing and owning the evaluation layer for large language models and AI systems used to support real, consequential decisions.
This includes:
You’ll operate as a lead individual contributor with real influence over how AI quality is defined, measured, and enforced.
What this role is not
This is a hands-on engineering role focused on evaluation, trust, and decision quality.
What we’re looking for
We’re looking for someone who has actually owned model evaluation, not just consumed metrics.
Strong signals include:
Backgrounds that tend to work well:
LLM experience
Hands-on experience with LLMs is highly relevant, especially around:
That said, evaluation maturity matters more than novelty.
Why this role is interesting
Compensation & location
If you’re excited by the idea of building the yardsticks that decide whether AI systems are actually helping or quietly harming decision-making, we’d love to hear from you.