AI Engineer Is a New Role
AI Engineer Is a New Role êŽë š
Building the Agent Is Easy. The Loop Is the Job.
Most of this thinking came from my time at Netflix, where I ramped up engineers on AI work and got it shipped to production. The same questions keep coming up. Whatâs the difference between AI engineering and ML? Why is the agent I built falling over the second I put it in front of real users? What am I supposed to be measuring?
After answering enough variations of those, I got pretty clear on what I think AI engineering actually is. AI engineer is its own role. Itâs not a rebrand of the ML engineer role. Itâs not âthe developer who calls an LLM API.â Itâs a distinct discipline with its own skills, mindset, and loop. Hereâs the case.
Demos are easy. Dependability is the Job.
Itâs so easy to make something that demos really well. Five minutes of vibe coding, a clean prompt, the happy path, and youâve got a tweet-worthy clip. Then I throw three ants into your prompt and let me see what happens. Itâs bad. Because itâs not really intelligent. Itâs a predictor.
The whole craft sits in that gap. How do you take something that has potential, thatâs really just a predictor, and make it dependable enough to ship to people solving real problems? Because if itâs not dependable, itâs not useful. Same as hiring someone who isnât dependable. That person probably wonât keep their job for long.
That gap, between âdemos greatâ and âdependable in production,â is the AI engineerâs full-time problem.
AI Engineer vs ML Engineer
I get this question a lot, especially from engineers ramping into AI work from a traditional product or ML background. Whatâs the difference?
Machine learning engineers focus on training models, gathering and managing datasets, and optimizing model performance. They live in the model layer. The science, the architecture, the workflows around training. Research engineers and research scientists sit alongside them, writing the white papers and running the experiments on which the field is built.
AI engineers live at the application layer. We take those models and that research and turn them into products that work for real users. If you go deep enough into this, youâll find yourself reading mathematical white papers, thinking, âOkay, this is a novel performant agentic architecture thatâs reliable at selecting tools, Iâm going to implement that.â Thatâs a real thing AI engineers do. But your output is a working product, not a trained model.
The Four Skills That Keep Showing Up
I went and looked at AI engineer job postings on LinkedIn. And yes, LinkedIn is not where you go for great job statistics, but itâs there. Four skills came up over and over:
- RAG
- Evals
- Agents
- Production deployment
Three of those are teachable as a curriculum. Production deployment is so specific to where you work that the best thing to do is teach you the questions to ask.
Under those headline skills sits the day-to-day work. This is where the discipline actually lives.
Context engineering. Some term made up in Silicon Valley. Basically, how do I send the right tokens to the model at the right time? Tokens are currency. They correlate directly to energy cost. Weâre all heading toward tokens per watt as the real unit of measure. Tokens really matter.
Tool design. How do we give agents the right abilities? Make sure they can do the right things and donât do the wrong things.
Evaluation. How do we measure our agents so we can tell whether theyâre actually improving, or we just feel like they are?
Production reliability. Self-healing, user experience, how a user knows when something is broken, handling errors and latency. The stuff that decides whether the system survives contact with reality.
Itâs a completely different way of thinking about building applications. And it lives at the application layer, which is what makes it AI engineering and not ML engineering.
The Build, Eval, Improve Loop
I want to get you in the right mindset because thatâs what makes the whole thing work. I call it the build-eval-improve loop.
Build â Eval â Improve â Eval â Improve
Building an agent is easy. You can vibe code an agent. There are SDKs that let you do it in five lines. Itâs really not that hard. Weâre not going to sit around talking about that part. The part that matters is everything that comes after. Evaluate where itâs bad. Figure out why itâs bad. Apply the right technique to fix that specific failure. Evaluate again.
This never stops. Itâs not a job thatâs ever going to be âdone.â This is the role required for a non-deterministic system that must be dependable. Thereâs no âship it and move on.â Thereâs only the loop.
Why This Becomes a Whole Team
If you donât believe me that AI engineering is its own discipline, go look at OpenAIâs job postings. They arenât hiring AI engineers in the abstract. Theyâre hiring people for one specific slice of the system. One team for tool selection. One team for human in the loop. One team for safety. One team just trying to get the token counts down without losing accuracy.
ChatGPT is still bad at a lot of things. Thatâs with entire teams dedicated to specific subsystems. Thatâs the scale of effort it takes when your product is an agent. It is not a side responsibility for a full-stack engineer. Itâs a discipline.
As more companies become AI-native, with the product itself being just an agent, weâre going to see massive teams of AI engineers, each working on a specific part. âYour job is to work on tool selection. Your job is to be a human in the loop. Your job is to bring these tokens down.â Thatâs the future, and itâs already here at the frontier labs.
The Hardest Part Is Picking the Metrics
In my opinion, the hardest part of this job isnât the code. Itâs about figuring out which data to use for evals and which metrics to score against. What are the most appropriate metrics that give the best signal? How do we score them? How do we evolve that scoring as the system gets more dependable?
Thatâs a lot of science and art and hand-wavy stuff. But itâs the foundation on which everything else is built. Pick the wrong metrics and your loop gets you nowhere. Pick the right ones and the whole system compounds.
This is why the role is its own thing. A software engineer optimizes deterministic code paths. A machine learning engineer optimizes a model. An AI engineer optimizes a feedback loop on top of a non-deterministic system, and most of the leverage comes from choosing what to measure.
The Practice Area, Not the Buzzword
Calling AI engineering a new discipline isnât a marketing claim. Itâs what the work looks like once you stop demoing and start shipping. Different layer than ML. Different mindset than traditional app development. Different loop. Different metrics. Increasingly its own career path.
If youâre a developer wondering whether to lean into this, hereâs the simplest framing Iâve got.
The agent is the easy part. The loop is the job.