Job Title: Infrastructure Engineer
Description:
* In this role, you will be responsible for designing and implementing scalable technical infrastructure solutions for deployment, training pipelines, resource management, and workflow orchestration.
* You will support development and ML teams across cloud and on-premise environments.
Responsibilities:
* Design and implement deployment strategies for software and assets to cloud environments.
* Create reliable pipelines for delivering and updating various resources to on-premise systems.
* Manage multi-environment infrastructure (development, staging, production).
* Build and maintain CI/CD pipelines for efficient software delivery.
* Design testing infrastructure to ensure software quality.
* Improve developer experience through tooling and workflow optimization.
* Manage hardware compute clusters and implement container orchestration systems for development, experimentation, and diverse workloads.
* Provision, configure, and optimize heterogeneous resources across distributed systems for parallel computing tasks.
* Build and maintain infrastructure for ML workloads, including training pipelines and data processing.
* Design and optimize resource allocation systems for efficient AI/ML computation.
Requirements:
* Bachelor's degree in Computer Science or related technical field.
* 4+ years of experience in infrastructure engineering or DevOps roles.
* Experience with containerization (Docker) and orchestration (e.g. Kubernetes).
* Knowledge of infrastructure as code tools (Terraform, Ansible) and CI/CD pipelines.
* Experience with HPC clusters (GPU+CPU) and job scheduling systems (e.g. Slurm).
* MLOps experience, including ML pipeline automation and experiment tracking.