Senior reinforcement learning builder

Milano

Neutralis S.R.L

Pubblicato il 26 novembre

Descrizione

About Neutralis

Neutralis is building the learning brain for industrial heat-pump plants. We fuse model-based RL with digital twins and strict safety constraints to turn messy plant telemetry into better decisions, hour by hour. This is paper-to-plant work with real impact on energy, reliability, and decarbonization.

The challenge

Industrial plants are complex, safety-critical, and non-stationary. Off-policy data, partial observability, actuator limits, drift, and human-in-the-loop operations make naïve RL fail fast. Your mission is to own a safe, reproducible path from data to control: offline → simulated → shadow → live, with guardrails at every step.

What you'll do

* Own the RL/control roadmap
: architect offline RL + model-based control with a digital twin in the loop; define safety envelopes and verification gates.
* Build the pipeline
: data curation, policy learning, simulation/gym environments, evaluation harnesses, and promotion criteria from sim to plant.
* Ship reproducible research to production
: baselines, ablations, and clear experiment tracking; transform results into services/APIs.
* Lead and mentor
a 15–20 person cohort of MSc/PhD thesis students and research engineers; set standards for code, experiments, and writing.
* Partner with domain experts
(HVAC/OT/BMS) on constraints, actuation limits, failure modes, and alarm triage.
* Land safety
: define fallback controllers, interlocks, and shadow-mode strategies; quantify risk and uncertainty.
* Collaborate across the stack
with our FastAPI services, time-series store, and observability/ML Ops.
* Communicate
: write crisp technical notes, contribute to publications where useful, and present results to partners.

What you'll bring

* Track record shipping
RL/controls for physical systems
(energy, robotics, process, automotive, etc.).
* Deep hands-on skill in
offline RL
(e.g., CQL/IQL/TD3-BC) and
model-based RL/MPC
; comfort with system identification and constrained optimization.
* Strong engineering in
Python
and
PyTorch or JAX
; experience with experiment tracking (MLflow/W&B), containers, and CI.
* Rigor around
evaluation and safety
: distribution shift, uncertainty, guardrails, fallback policies.
* Ability to
lead, mentor, and scale
a research-engineering team.
* Clear writing and stakeholder communication.
* Degree in CS/EE/ME/Controls or equivalent experience.

Nice to have

* Familiarity with OT/BMS/historians (OPC UA, Modbus, BACnet, PI), time-series modeling, anomaly detection.
* Experience with digital twins/simulation, domain randomization, and sim-to-real transfer.
* MLOps in AWS; FastAPI, PostgreSQL + a time-series DB.
* Italian language skills.

Why Neutralis

* Hard problems, real plants
: your work moves real energy, not just a leaderboard.
* Ownership
: technical stewardship from first principles to deployment.
* Talent platform
: lead a serious thesis cohort and shape a next-gen team.
* Impact
: measurable COP uplift, energy savings, reliability gains.
* Compensation
: competitive package with meaningful equity; conference and equipment budget.

Location & working model

On-site in Milan (primary). Some flexibility for exceptional candidates. Occasional visits to partner sites.

What success looks like (6–12 months)

* A documented, reproducible RL pipeline from data → policy → evaluation → shadow.
* Benchmarked policies that outperform baselines in sim and shadow with clear safety margins.
* A mentored student cohort delivering publishable experiments and production-ready components.
* Accepted path to controlled live trials with partners.

How to apply

Apply on LinkedIn or send a short note with "
RL — Senior
", a link to work you're proud of (GitHub/Google Scholar/website), and availability. DMs welcome.

* Neutralis is an equal-opportunity employer. We value clarity, safety, and results over pedigree. If you've shipped control systems that matter, we want to hear from you.

Rispondere all'offerta

Crea una notifica

Salva