Infrastructure / devops engineer

Alto

European Tech Recruit

Pubblicato il Pubblicato 11h fa

Descrizione

PpInfrastructure DevOps Engineer | Italy (Remote/Hybrid) /p pA leading European innovator in high-performance computing and AI infrastructure is seeking an Infrastructure DevOps Engineer to architect the foundation of its compute capabilities. The company operates at the cutting edge of multi-GPU cluster management, bridging the gap between sophisticated hardware provisioning and seamless cloud-native orchestration. /p h3The Role /h3 pThe focus of this position is the end-to-end reliability of a heterogeneous compute environment. The engineer will be responsible for making large-scale deployments reproducible and ensuring that developers have frictionless access to high-power resources. /p h3Key Responsibilities /h3 ul liProvision and maintain high-performance CPU/GPU clusters across multiple physical locations. /li liImplement dynamic compute and storage scaling to meet fluctuating workload demands. /li liDesign hardware and software-level storage solutions, including distributed filesystems and storage tiering. /li liManage container orchestration through Kubernetes and Docker for both production and RD workloads. /li liDevelop infrastructure as code (IaC) utilising Terraform and Ansible. /li liOptimise job scheduling and resource allocation via Slurm and Kubernetes. /li liEstablish robust observability using Prometheus, Grafana, and IPMI. /li liConduct system-level performance profiling, focusing on GPU utilisation and I/O throughput. /li liOversee secure networking, VPN management, and disaster recovery protocols. /li /ul h3Technical Profile /h3 pThe ideal candidate brings a deep understanding of Linux system administration and the unique challenges of managing bare-metal and virtualised hardware. /p h3Essential Experience /h3 ul liAdvanced Linux administration and networking principles. /li liProven expertise with Docker and Kubernetes orchestration. /li liHands‑on experience with IaC tools (Terraform or Ansible). /li liBackground in HPC environments and job scheduling via Slurm. /li liExperience in hardware infrastructure management (IPMI, BMC) and server maintenance. /li liAbility to design storage systems such as NFS, Ceph, or other distributed filesystems. /li /ul h3Preferred Skills /h3 ul liFamiliarity with bare‑metal provisioning tools like MaaS. /li liExperience navigating cloud service environments (AWS or similar). /li /ul h3Why Join? /h3 pThe company offers the opportunity to work on highly complex, large‑scale infrastructure projects that directly power the next generation of AI development. This is a chance to move beyond standard cloud DevOps and dive into the intricacies of hardware‑level performance and global compute distribution. /p h3How to Apply /h3 pIf this role is of any interest, please apply directly on LinkedIn or send a copy of your CV to Alternatively, you may send your CV to /p pBy applying to this role you understand that we may collect your personal data and store and process it on our systems. For more information please see our Privacy Notice. /p /p #J-18808-Ljbffr

Rispondere all'offerta

Crea una notifica

Salva