Cambrex Profarmaco Milano is looking for a curious and dynamic intern interesed in Artificial Intelligence.
For our Site in Paullo (Milan) we are looking for a trainee interested in with an expertise in ML methods and AI to insert in our Analytical Development team for an industrial secondment focused on building an AI-powered HPLC recommendation engine. This project leverages proprietary pharmaceutical data to develop machine learning models that predict optimal chromatographic conditions for new compounds.
* Responsibilities:
* Design and curate a structured experimental knowledge database linking molecular representations (SMILES, InChI, fingerprints), physicochemical descriptors, chromatographic method parameters, and experimental outcomes, with a focus on data quality, reproducibility, and design suitable for ML;
* Develop a feature selection pipeline for classical descriptors (RDKit, Mordred) and learned representations (molecular fingerprints, graph-based embeddings);
* Research, train, and benchmark predictive models for chromatographic outcomes (retention behaviour, mobile phase strength, column selectivity class), exploring both interpretable models and state-of-the-art approaches;
* Design a molecular similarity module grounded in chemical space geometry, evaluating distance metrics and embedding spaces for nearest-neighbour method retrieval from historical data;
* Build a recommendation engine that unifies predictive modelling and the similarity module, with uncertainty quantification, confidence scoring, and explainability to support trust and adoption by domain scientists;
* Extend the engine into an agentic LLM interface that allows natural language interaction with the underlying models and database;
* Validate the system on held-out experimental data, document methodology to publication standard, and present research outputs to both technical and domain-expert audiences.
Qualifications and Skills:
* Degree in Computer Science/Engineering or a closely related field;
* Deep understanding of ML and AI.
* Strong research instincts: ability to identify the right problem formulation, design-controlled experiments and critically evaluate model behaviour rather than just benchmark metrics;
* Proficiency in Python and relevant libraries (scikit-learn, pandas, NumPy, PyTorch, TensorFlow), comfort reading and adapting research code.
Soft Skills:
* Excellent interpersonal and communication skills.
* Ability to work effectively in a team and flexibility;
* Proactivity, a strong focus on results, and problem-solving skills;
* Ability to work independently and communicate technical results to a non-specialist audience.
It would be considered a plus:
* Familiarity with molecular representations and cheminformatics tools (SMILES, fingerprints, graph neural networks for molecules) or willingness to learn;
* Active interest in explainable and interpretable ML (XAI), particularly in applied scientific contexts where trust and transparency are critical;
* Hands-on experience with LLM tool-use, function calling, or agentic frameworks or conceptual grounding in how LLMs interact with external systems;
* Exposure to scientific, industrial, or experimental datasets with inherent noise, class imbalance, or sparse labelling, common in real-world R&D settings.
What You Will Gain:
* Access to a proprietary industrial HPLC dataset not available in academic settings;
* AI research challenge with real-world constraints;
* Immersion in cheminformatics and pharmaceutical analytical R&D, with direct collaboration with domain scientists who will challenge and sharpen your modelling decisions;
* Research outputs aligned with your doctoral trajectory: publishable methodology, a working system demonstrating scientific AI in industrial settings;
* Mentoring from both chemoinformatics and domain experts, with genuine intellectual exchange;
* Professional networking opportunities.
Location: Cambrex Profarmaco Milano Srl, Paullo (MI) On-site or Hybrid Model (to be discussed)
Contract: We offer 1 year scholarship contract, details will be clarified during the interview process.