European Tech Recruit are working closely with a market leading 3D scanning company, based in Bressanone, who are looking for a talented Infrastructure / DevOps Engineer to join their team.
In this role you will join a company that leverages state-of-the-art Computer Vision and Machine Learning algorithms to scan high quality, relightable 3D models of objects and products at scale.
You will build and maintain the foundation of their compute infrastructure. You'll work on hardware provisioning, networking, container orchestration, and deployment pipelines across cloud and on-premise environments. This role focuses on making their multi-GPU clusters reliable, their deployments reproducible, and their developers productive.
Responsibilities as Infrastructure / DevOps Engineer
* Provision, configure, and maintain heterogeneous compute clusters (CPU/GPU) across multiple physical locations.
* Implement dynamic compute and storage provisioning based on workload demands.
* Design storage solutions at both hardware and software level (NAS, distributed filesystems, storage tiering).
* Implement and manage container orchestration systems (Kubernetes, Docker) for development and production workloads.
* Design and maintain infrastructure as code using tools like Terraform and Ansible.
* Build and optimize job scheduling and resource allocation systems (Slurm, Kubernetes).
* Set up monitoring, alerting, and observability infrastructure (Prometheus, Grafana, IPMI).
* Profile and optimize system-level performance: GPU utilization, memory bandwidth, I/O throughput, network latency.
* Manage networking, VPNs, and secure access across distributed systems.
* Handle reliability concerns: hardware failure detection, job checkpointing, disaster recovery.
Requirements
* Strong Linux system administration knowledge.
* Experience with containerization (Docker) and orchestration (Kubernetes).
* Knowledge of infrastructure as code (Terraform, Ansible).
* Experience with HPC clusters and job scheduling (Slurm).
* Familiarity with monitoring solutions (Prometheus, Grafana).
* Understanding of networking principles and implementation.
* Experience with hardware infrastructure management (IPMI, BMC, server maintenance).
* Knowledge of storage systems design (NFS, Ceph, distributed filesystems).
* Experience with cloud services (AWS, or others).
* Familiarity with bare-metal provisioning (MaaS).
If this role is of any interest please apply directly on LinkedIn or send a copy of your CV to nh@eu-recruit.com.
#J-18808-Ljbffr