# ByteSearch (agency) — Senior ML Engineer

| Field | Value |
|---|---|
| **Date found** | 2026-05-27 |
| **Company** | ByteSearch (recruiting agency — client undisclosed) |
| **Role** | Senior ML Engineer (Inference Infrastructure) |
| **Location** | EU Remote |
| **Salary** | €160,000 + equity |
| **Job URL** | https://www.linkedin.com/jobs/view/4413307762/ |
| **Status** | New |

---

## Company Research

Recruiting agency — client undisclosed.

---

## Job Summary

**What they do:** ByteSearch is placing for a "fast-growing infrastructure startup" building systems that optimise large-scale AI and cloud workloads across Kubernetes and GPU infrastructure.

**The role:** Senior ML Engineer focused on LLM inference optimisation — throughput, latency, batching, GPU memory utilisation — in production environments.

**Core work:**
- Optimise LLM inference throughput and latency across GPU environments
- Improve batching, scheduling, quantisation, and memory utilisation strategies
- Profile and debug compute/networking/memory bottlenecks at scale across multi-GPU clusters

**Stack:** Python · PyTorch · Kubernetes · distributed inference tooling · GPU optimisation · observability platforms · CI/CD

**Work style:** Fully remote, Europe-based.

---

## Score: 65%

| Dimension | Score | Justification |
|---|---|---|
| Agentic AI depth (25%) | 25% | Pure ML inference infrastructure / GPU optimisation — no agentic AI, no LLM orchestration or agent design |
| Tech fit (25%) | 65% | Python and PyTorch are a match; GPU/CUDA/inference systems are adjacent to Luca's stack but not his primary domain |
| Remote fit (25%) | 95% | Fully remote Europe — no on-site requirement |
| Company culture fit (15%) | 65% | Remote-first infrastructure startup, fast-paced; culture signals positive but client identity unknown |
| IC/leadership balance (10%) | 90% | Senior IC role with autonomy and ownership |
| **Final (weighted)** | **65%** | |

---

## Strengths

- Excellent compensation: €160k + equity, clearly above Luca's floor
- Fully remote EU — no location constraints
- Fast-moving, high-ownership environment
- PyTorch and distributed systems experience is applicable

---

## Weaknesses & Risks

- Agentic AI depth is very low — this is GPU/inference infrastructure engineering, not Luca's core domain
- Requires deep expertise in GPU performance tuning, quantisation, CUDA — Luca's background is application-layer ML, not systems ML
- Anonymous client — company culture, domain, and stability unverifiable
- High competition: 66 applicants with 38 in one day; 67% senior-level candidates

---

## Suggestions

- Only worth pursuing if open to an infrastructure/systems ML pivot for the compensation
- If applying, emphasise production ML deployment experience, MLOps, and large-scale system operation
- Ask recruiter early for client company name and more specifics on the stack (VLLM, TGI, TensorRT-LLM?)
- Frame Fenergo's production AI pipeline infrastructure work as relevant prior art

---

## Interview Tracker

| Stage | Date | Notes |
|---|---|---|
| Applied | | |
| Recruiter screen | | |
| Technical interview | | |
| Final round | | |
| Offer / Outcome | | |