# Iris.ai — Senior NLP / ML Researcher (LLM Evaluation & Agentic Systems)

| Field | Value |
|---|---|
| **Date found** | 2026-05-23 |
| **Company** | Iris.ai |
| **Role** | Senior NLP / ML Researcher (LLM Evaluation & Agentic Systems) |
| **Location** | European Economic Area — Fully Remote |
| **Salary** | Undisclosed (claimed 25% above local market) + ESOP |
| **Job URL** | https://www.linkedin.com/jobs/view/4368722322/ |
| **Status** | New |

---

## Company Research

| Field | Value |
|---|---|
| **Headquarters** | Oslo, Norway |
| **Founded** | 2015 |
| **Employees** | ~41 (41 on LinkedIn) |
| **LinkedIn** | https://www.linkedin.com/company/iris-ai/ — 10,582 followers |
| **Website** | https://iris.ai |
| **Blog** | — |

- **Product:** Context-first enterprise AI platform (Neuralith, Axion, RSpace) covering data ingestion, advanced RAG, agentic orchestration, and LLM evaluation for complex technical knowledge domains.
- **Customers:** Regulated industries — telecom, manufacturing, pharmaceuticals, government, banking; enterprises where accuracy is critical.
- **Notable:** Backed by EIC Accelerator; named AWS Frontier AI Startup; ISO 27001 certified; published WISDM (2017) and ConSens (2025) LLM evaluation research; 40+ NLP/LLM specialists across Europe.

---

## Job Summary

**What they do:** Iris.ai builds an agentic AI platform that scales expert-level domain knowledge across enterprises, with deep focus on accuracy, evaluation, and responsible AI.

**The role:** Senior IC researcher driving novel NLP/LLM research directions (evaluation, agentic reasoning, multilingual NLP) and translating them into production capabilities inside the Iris.ai platform.

**Core work:**
- Research and implement novel methods for LLM evaluation, uncertainty estimation, and RAG robustness in production systems
- Design agentic reasoning and control mechanisms (when to reason vs. act, inference-time steering)
- Co-author and lead EU/national research grant proposals (Horizon, EIC, national funding)
- Translate research into product features deployed in enterprise environments

**Stack:** Python · PyTorch · Transformers · Hugging Face · RAG pipelines · Multi-agent frameworks · AWS · Docker

**Work style:** Fully remote, within European time zones; research-driven culture with regular publications, conferences, and internal knowledge-sharing.

---

## Score: 83%

| Dimension | Score | Justification |
|---|---|---|
| Agentic AI depth (25%) | 70% | Genuine agentic reasoning research + LLM evaluation, directly feeds into production agentic platform. Research framing with PhD/publications requirement leans toward applied research, not pure engineering. |
| Tech fit (25%) | 82% | Strong overlap: PyTorch, Transformers, Hugging Face, RAG pipelines, multi-agent frameworks, AWS, Docker. LangChain/LangGraph not explicitly listed but multi-agent frameworks mentioned. |
| Remote fit (25%) | 100% | Fully remote, EEA-wide — no on-site requirement. |
| Company culture fit (15%) | 75% | AI-native product startup (~41 employees), founded on NLP/AI research, values accuracy and rigor over demo culture. Good fit for a high-standards engineer. |
| IC/leadership balance (10%) | 88% | Pure IC researcher role; informal mentoring and knowledge sharing expected, no people management. |
| **Final (weighted)** | **83%** | Strong match on remote fit, tech stack, and AI-native culture; agentic depth is real but research-framed. |

---

## Strengths

- Fully remote EEA — ideal location fit
- Genuine agentic AI and LLM evaluation work at the core of the product, not a side project
- Small AI-native startup with strong technical culture and equity upside
- Deep NLP/RAG/agentic stack alignment with Luca's current work at Fenergo
- High-impact: the research feeds directly into production systems used by enterprise clients

---

## Weaknesses & Risks

- PhD required ("PhD in ML, NLP, CS or related field") — Luca does not hold a PhD; this is a significant qualification gap
- Publications requirement ("publications in ML/NLP conferences or journals") — Luca has no listed publications
- EU grant proposal writing expected — non-trivial additional workload outside core engineering
- Salary undisclosed despite claim of "25% above market"; verify before committing time
- Role is more research-engineer than pure engineering — may drift toward academic pace

---

## Suggestions

- Apply and address PhD gap directly: 10+ years of production ML with frontier LLM/agentic systems is equivalent for applied research roles
- Emphasise the Fenergo agentic document parser project — directly relevant to LLM evaluation and RAG in regulated environments
- Highlight the NLP depth: transformers, semantic chunking, RAG, document understanding all align with Iris.ai's core work
- Ask about the PhD requirement flexibility and whether strong applied experience is accepted in lieu
- Ask about salary range before advanced stages

---

## Interview Tracker

| Stage | Date | Notes |
|---|---|---|
| Applied | | |
| Recruiter screen | | |
| Technical interview | | |
| Final round | | |
| Offer / Outcome | | |
| Rejected | 2026-05-25 | Too academic — research experience and PhD required, not a fit. |