# Confidential — Senior LLM Inference Engineer — Performance & GPU Optimization

| Field | Value |
|---|---|
| **Date found** | 2026-05-29 |
| **Company** | Confidential (client undisclosed) |
| **Role** | Senior LLM Inference Engineer — Performance & GPU Optimization |
| **Location** | Ireland — Remote |
| **Salary** | Undisclosed |
| **Job URL** | https://www.linkedin.com/jobs/view/4421790704/ |
| **Status** | New |

---

## Company Research

Recruiting agency — client undisclosed.

---

## Job Summary

**What they do:** Established enterprise software company building LLM-powered capabilities into products, needing deep inference performance expertise.

**The role:** Senior IC owning LLM inference optimization — latency, throughput, and cost-per-token at the GPU and serving-engine level.

**Core work:**
- Optimize LLM inference for latency, throughput, and cost at kernel and serving-engine level
- Profile and tune GPU performance (CUDA, TensorRT-LLM); apply quantization, speculative decoding, batching strategies
- Extend and optimize serving frameworks (vLLM, SGLang, Triton) where they fall short

**Stack:** Python · CUDA · TensorRT-LLM · vLLM · SGLang · Triton · NVIDIA GPUs · PyTorch

**Work style:** Fully remote Ireland; client identity undisclosed

---

## Score: 60%

| Dimension | Score | Justification |
|---|---|---|
| Agentic AI depth (25%) | 25% | Pure inference performance engineering — no agentic work; infrastructure one layer below AI applications |
| Tech fit (25%) | 45% | Python and PyTorch overlap; but requires deep CUDA/GPU performance expertise not in target profile |
| Remote fit (25%) | 100% | Fully remote Ireland — no location friction |
| Company culture fit (15%) | 55% | Unknown enterprise software company — likely not AI-native startup |
| IC/leadership balance (10%) | 90% | Pure IC role with full ownership of performance wins |
| **Final (weighted)** | **60%** | |

---

## Strengths

- Fully remote Ireland with no restrictions — ideal work style
- Only 6 applicants at time of discovery — very early applicant window
- Bologna alumni connection in job posting
- Rare specialisation that commands high compensation

---

## Weaknesses & Risks

- Role requires deep GPU performance engineering (CUDA kernel-level expertise) — significant skill gap from target profile
- Pure inference optimization, no agentic AI work whatsoever
- Client completely anonymous

---

## Suggestions

- Only apply if pivoting toward LLM infrastructure/performance engineering
- Would require significant upskilling in CUDA/GPU optimization
- Ask to reveal the client before proceeding

---

## Interview Tracker

| Stage | Date | Notes |
|---|---|---|
| Applied | | |
| Recruiter screen | | |
| Technical interview | | |
| Final round | | |
| Offer / Outcome | | |
