Walt AI Inc.
Bangalore, Mangalore, Hyderabad
Team: Core Platform / AI Systems
Stage: Early, high-leverage role
<h3>Why this role exists</h3>AI Engineers own Walt’s brain.
Walt turns natural language into accurate, explainable analytics over real enterprise data. That only works if the reasoning layer is strong: understanding intent, planning multi-step queries, retrieving the right context, executing safely, and explaining results clearly.
This is where the magic happens.
As an AI Engineer, you design and build the LLM reasoning systems, agentic workflows, and retrieval architecture that power Walt. This role blends prompt engineering, multi-agent system design, and production-grade ML engineering. You’ll go deep on individual components while maintaining a systems-level view of how everything fits together.
If you care about reasoning correctness, not just demos and want to build AI systems that hold up in enterprise environments - this role is for you.
<h3>What you’ll do</h3>Build Walt’s reasoning layer
-
Design and implement LLM reasoning frameworks (ReAct, chain-of-thought, tool-use patterns)
-
Translate user intent into structured plans, tool calls, and executable queries
-
Continuously improve reasoning reliability and explainability
Design agentic workflows
-
Build multi-step, multi-agent workflows for complex analytical questions
-
Handle planning, execution, retries, and failure modes
-
Balance autonomy with guardrails
Implement retrieval & context systems
-
Build RAG pipelines for schema context, metrics, and semantic search
-
Design chunking, ranking, and retrieval strategies that actually improve accuracy
-
Integrate vector search with structured metadata and constraints
Structure and validate outputs
-
Design structured output systems using function calling and MCP
-
Enforce schemas, contracts, and invariants across agent outputs
-
Build robust output validation for enterprise use cases
Measure and improve accuracy
-
Build evaluation pipelines and offline/online benchmarks
-
Design feedback loops to systematically improve correctness
-
Track failure modes and regression risks over time
Ship production-grade AI systems
-
Implement safety guardrails, rate limits, and cost controls
-
Ensure latency, reliability, and debuggability
-
Work closely with data engineers and FDEs to ground systems in real data
Deep LLM expertise
-
Hands-on experience with GPT-4, Claude, or similar models
-
Strong intuition for model behavior, strengths, and failure modes
-
Knows where LLMs break — and how to design around it
Advanced prompt engineering
-
Few-shot prompting, chain-of-thought, ReAct, self-consistency
-
Can reason about prompts as programs, not text blobs
Retrieval systems experience
-
Vector databases (Pinecone, pgvector, or similar)
-
Embedding models, chunking strategies, ranking and filtering
-
Understands when RAG helps — and when it hurts
Agent orchestration
-
Experience with LangChain, LangGraph, or equivalent patterns
-
Comfortable building multi-agent coordination and tool-calling systems
-
Thinks in graphs and state machines, not linear scripts
Strong Python engineering
-
Production-grade Python, async programming, API design
-
Experience building and operating ML-backed services
-
You’re building core reasoning infrastructure, not UI wrappers
-
Accuracy and trust matter more than flashy demos
-
Your work directly defines what Walt can and cannot do
-
You’ll influence architecture, eval strategy, and long-term technical direction
-
We’re building agentic AI for real enterprise data, not toy problems
-
Founding team with deep data engineering and systems experience
-
Small team, high ownership, fast iteration
-
We value intellectual honesty, directness, and people who care deeply about correctness
Original posting: https://hasjob.co/thewalt.ai/ms5a5