Staff AI Engineer
$225K–$250K + meaningful equity
San Francisco, CA (Onsite)
A well-funded AI infrastructure startup is hiring a Staff AI Engineer to help build the core agentic intelligence layer powering automation inside complex engineering software environments. The product already has meaningful traction with Fortune 100 customers and is backed by top-tier investors in the AI ecosystem.
This is a highly technical, deeply hands-on role for someone who wants to work on difficult real-world agent problems, not lightweight chatbot wrappers or internal prototypes.
The Opportunity
You’ll own foundational agent architecture and help define how AI systems reliably execute complex workflows inside real enterprise environments.
This role sits directly alongside company leadership and will heavily influence the future direction of the platform, evaluation infrastructure, and broader AI engineering strategy.
The environment is highly execution-oriented. Leadership is measured through technical contribution, system ownership, and shipping production systems under ambiguity.
What You’ll Be Doing
Build and improve production agentic AI systems capable of executing multi-step workflows across desktop software environments
Own core architectural decisions around:
Tool orchestration
Context management
State handling
Error recovery
Model routing
Workflow execution reliability
Design and scale evaluation frameworks measuring:
Workflow success rate
Reliability
Failure modes
Cost efficiency
Regression detection
Define token budgets and optimize inference efficiency for commercially viable agent execution
Work closely with researchers, domain experts, and product stakeholders to translate real user workflows into measurable agent benchmarks
Lead technically while remaining highly hands-on in implementation and architecture
Collaborate with customers and internal teams to improve workflow coverage and production performance
What They’re Looking For
Strong production experience building agentic AI systems, not just LLM-powered interfaces
Deep understanding of:
Tool-calling agents
Multi-step orchestration
State management
Context handling
Workflow reliability under ambiguity
Strong Python engineering background
Experience designing evaluation and benchmarking systems for AI agents or complex ML systems
Comfort operating in highly ambiguous startup environments with broad ownership
Strong systems thinking and architecture instincts
Ability to remain deeply hands-on technically while influencing engineering direction
Strong Signals
Experience with:
SWE-bench, GAIA, or similar evaluation frameworks
LangSmith, Logfire, tracing, or observability tooling
Agent orchestration frameworks
Desktop automation
Enterprise AI deployments
Workflow automation systems
Startup experience is highly valued, especially in technically demanding product environments
Location / Work Setup
Full-time onsite in San Francisco
Flexible hours and a high-autonomy environment
Compensation includes meaningful equity participation