About ALLSIDES
\nALLSIDES is redefining how the world experiences 3D content. We combine physically accurate scanning and generative AI to power content creation workflows for e-commerce, virtual environments, and immersive experiences. Our clients include global brands like adidas, Meta, Amazon, and Zalando.\n
We operate a rapidly scaling photorealistic 3D scanning operation, capturing tens of thousands of assets annually while training next-generation AI models. As an NVIDIA Inception member, we collaborate with leading research institutions and actively participate in top-tier conferences in 3D computer vision and AI.
\n
More info:\nhttps://www.allsides.tech\n|\nhttps://blogs.nvidia.com/blog/covision-adidas-rtx-ai/
\n
Position Overview
\nWe're looking for an Infrastructure & DevOps Engineer to build and maintain the foundation of our compute infrastructure. You'll work on hardware provisioning, networking, container orchestration, and deployment pipelines across cloud and on-premise environments. This role focuses on making our multi-GPU clusters reliable, our deployments reproducible, and our developers productive.\n
Main Responsibilities
\n
\n
- Provision, configure, and maintain heterogeneous compute clusters (CPU/GPU) across multiple physical locations
\n
- Implement dynamic compute and storage provisioning based on workload demands
\n
- Design storage solutions at both hardware and software level (NAS, distributed filesystems, storage tiering)
\n
- Implement and manage container orchestration systems (Kubernetes, Docker) for development and production workloads
\n
- Design and maintain infrastructure as code using tools like Terraform and Ansible
\n
- Build and optimize job scheduling and resource allocation systems (Slurm, Kubernetes)
\n
- Set up monitoring, alerting, and observability infrastructure (Prometheus, Grafana, IPMI)
\n
- Profile and optimize system-level performance: GPU utilization, memory bandwidth, I/O throughput, network latency
\n
- Manage networking, VPNs, and secure access across distributed systems
\n
- Handle reliability concerns: hardware failure detection, job checkpointing, disaster recovery
\n
\nQualifications\n
\n
- Strong Linux system administration knowledge
\n
- Experience with containerization (Docker) and orchestration (Kubernetes)
\n
- Knowledge of infrastructure as code (Terraform, Ansible)
\n
- Experience with HPC clusters and job scheduling (Slurm)
\n
- Familiarity with monitoring solutions (Prometheus, Grafana)
\n
- Understanding of networking principles and implementation
\n
- Experience with hardware infrastructure management (IPMI, BMC, server maintenance)
\n
- Knowledge of storage systems design (NFS, Ceph, distributed filesystems)
\n
\nNice to Have\n
\n
- Experience with cloud services (AWS, or others)
\n
- Familiarity with bare-metal provisioning (MaaS)
\n
\nWhat we offer\n
\n
- Compensation that reflects your experience including stock-options
\n
- Lunch voucher for working days
\n
- We assist with relocation
\n
- Flexible working hours and work-from-home policy
\n
- Family-friendly environment
\n
- Amazing office space in South Tyrol, located at the Durst Group
\n
- Personal and professional growth opportunities
\n
\nYou don't have to tick every box to apply, your drive and passion matter most!\nThis role is located on-site in Brixen/Bressanone, Italy. If you are interested, please apply with your CV attached to careers@allsides.tech