# Toloka — ML Lead Engineer

| Field | Value |
|---|---|
| **Date found** | 2026-05-21 |
| **Company** | Toloka |
| **Role** | ML Lead Engineer |
| **Location** | EU Remote |
| **Salary** | Undisclosed (competitive + base + bonus + ESOP) |
| **Job URL** | https://www.linkedin.com/jobs/view/4417752502/ |
| **Status** | New |

---

## Company Research

| Field | Value |
|---|---|
| **Headquarters** | Amsterdam, Netherlands |
| **Founded** | 2014 |
| **Employees** | 51–200 (1,123 on LinkedIn) |
| **LinkedIn** | [company/toloka](https://www.linkedin.com/company/toloka/) — 146K followers |
| **Website** | [toloka.ai](https://toloka.ai) |
| **Blog** | [toloka.ai/blog](https://toloka.ai/blog/) |

- **Product:** AI training data and evaluation platform — curated datasets, RLHF, and evaluation services for LLMs and AI agents
- **Customers:** Anthropic, Amazon, Microsoft, Shopify
- **Notable:** Originally a Yandex crowdsourcing platform; backed by Bezos Expeditions ($72M, May 2025) and Nebius Group as strategic investor

---

## Job Summary

**What they do:** AI training data and evaluation platform for leading GenAI models — clients include Anthropic, Amazon, Microsoft, and Shopify. Backed by Bezos Expeditions ($72M round, 2025).

**The role:** ML Lead Engineer in the Delivery division — leads applied ML initiatives integrated directly into active client engagements.

**Core work:**
- Design agentic workflows for data generation: tool use, planning, retrieval, critique loops
- Build automated judge models and evaluation harnesses
- Productise reusable LLM pipeline components across multiple client accounts

**Stack:** Python · LLMs · LangChain/LangGraph (implied) · evaluation frameworks · data pipelines

**Work style:** Fully remote, EU

---

## Score: 75%

| Dimension | Score | Justification |
|---|---|---|
| Agentic AI depth (25%) | 70% | Agentic workflows for data generation, automated evaluation, LLM pipelines — real agentic work but for data/eval delivery context, not product R&D |
| Tech fit (25%) | 65% | Python, LLMs, agents, evaluation harnesses, prompt engineering — good overlap; no explicit LangGraph/CrewAI but agentic patterns match |
| Remote fit (25%) | 100% | Fully remote EU |
| Company culture fit (15%) | 70% | AI-native data company, Anthropic/Shopify as clients, startup scale (51-200 employees), Bezos-backed — good signals |
| IC/leadership balance (10%) | 55% | Mixed: IC agentic engineering + team hiring/coaching — more leadership than Luca's IC-only preference |
| **Final (weighted)** | **75%** | |

---

## Strengths

- Fully remote EU — no location friction
- Clients include Anthropic — strong signal of AI credibility and interesting downstream work
- Agentic workflows (tool use, planning, critique loops) are genuinely hands-on
- ESOP + competitive comp package
- Low applicant count at discovery (6 applicants)

---

## Weaknesses & Risks

- "Team building: Hire, coach, mentor" is a hard expectation — mixed IC/management
- Salary undisclosed — must verify it meets €110k+ target early
- Role is more delivery/client-facing than product R&D; less ownership over the AI product itself
- Company relatively small (51-200) with uncertain growth trajectory

---

## Suggestions

- Ask early about the team size and how much direct IC coding vs management is expected
- Verify salary range before deep-diving into process
- Emphasise agentic pipeline and evaluation harness experience; highlight any LLM judging or auto-grading work

---

## Interview Tracker

| Stage | Date | Notes |
|---|---|---|
| Applied | | |
| Recruiter screen | | |
| Technical interview | | |
| Final round | | |
| Offer / Outcome | | |