Lavoro
I miei annunci
Le mie notifiche
Accedi
Trovare un lavoro Consigli per cercare lavoro Schede aziende Descrizione del lavoro
Cerca

Freelance agent evaluation engineer

Roma
Libero professionista
Mindrift
Commerciale
Pubblicato il 6 maggio
Descrizione

Please submit your CV in English and indicate your level of English proficiency.
Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.
What This Opportunity Involves
We’re building a dataset to evaluate AI coding agents - how well a model handles real-world developer tasks.
You’ll create challenging tasks and evaluation criteria within realistic simulated environments:

* Build realistic developer environments - a virtual company with codebase, infrastructure, and context (tickets, docs, conversations) that forms a believable development history
* Design tasks from intermediate states of these environments - craft the prompt, define what “solved” means, and ensure the task is solvable by an AI agent
* Write tests that verify agent solutions - accept all valid approaches and reject incorrect ones, neither too strict nor too lenient
* Iterate on tasks and tests based on QA feedback - review agent solutions, analyze failures, and refine until the evaluation is fair and robust
What This Is NOT
* Not data labeling
* Not prompt engineering
* Not writing code from scratch - the agent writes most of the code; you guide and evaluate
What We Look For
* 5+ years in software development
* Core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, Redis
* Experience writing tests (functional, integration)
* English proficiency - B2+
Why this is hard
Frontier models are already good at coding. Creating a task that genuinely challenges the best models is non-trivial. You need to deeply understand where models fail and what scenarios reveal the difference between a good and a bad solution. Tasks have many valid solutions - writing tests that accept all correct solutions and reject incorrect ones is harder than it sounds.
How It Works
Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid
Effort estimate
Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.
Compensation
Up to $40/hr equivalent, depending on level and pace. Tasks are estimated at :20 hours each; you set your own schedule.

Rispondere all'offerta
Crea una notifica
Notifica attivata
Salvato
Salva
Offerta simile
Informatore / commerciale - roma nord - rieti - viterbo - €20.000 all'anno
Roma
Gruppo Medicale
Commerciale
Offerta simile
Informatore / commerciale - roma nord - rieti - viterbo - €20.000 all'anno
Roma
Caress Flow
Commerciale
Offerta simile
Informatore / commerciale - roma nord - rieti - viterbo - €20.000 all'anno
Roma
Caress Flow
Commerciale
Offerte simili
Lavoro Commercio a Roma
Lavoro Roma
Lavoro Provincia di Roma
Lavoro Lazio
Home > Lavoro > Lavoro Commercio > Lavoro Commerciale > Lavoro Commerciale a Roma > Freelance Agent Evaluation Engineer

Jobijoba

  • Consigli per il lavoro
  • Recensioni Aziende

Trova degli annunci

  • Annunci per professione
  • Annunci per settore
  • Annunci per azienda
  • Annunci per località

Contatti/Partnerships

  • Contatti
  • Pubblicate le vostre offerte su Jobijoba

Note legali - Condizioni generali d'utilizzo - Politica della Privacy - Gestisci i miei cookie - Accessibilità: Non conforme

© 2026 Jobijoba - Tutti i diritti riservati

Rispondere all'offerta
Crea una notifica
Notifica attivata
Salvato
Salva