Following the creation of a new internal structure, we are looking for an experienced Site Reliability Engineer (SRE) to join our Infrastructure team.
Responsibilities:
* System Reliability: Ensuring the reliability and availability of our platforms and technological systems through robust monitoring, reporting, and incident response procedures.
* Infrastructure Automation: Automating the deployment, scaling, and management of services and infrastructure components for critical applications like digital channels and branches.
* Resource Planning: Collaborating with cross-functional teams to forecast and plan future resource requirements for all infrastructure systems.
* Performance Optimization: Analyzing platform performance to improve efficiency, ensuring an optimal experience for users and end customers.
* Incident Management Support: Participating in troubleshooting sessions, supporting operational and application teams, analyzing monitoring data and root causes, and proposing solutions.
* Security: Supporting implementation and maintaining security best practices, participating in vulnerability assessments and threat mitigation.
* Continuous Improvement: Improving system reliability through root cause analysis, incident reporting, and proactive maintenance and evolution of systems and platforms.
Required Experience:
* Excellent knowledge of Terraform and Ansible
* Understanding of containerization technologies (e.g., Docker, containerd)
* Expertise in Kubernetes management and components (e.g., ingresses, monitoring stacks, custom autoscalers)
* Strong troubleshooting skills
* Understanding of delivery systems (e.g., Helm, GitOps)
* Knowledge of at least one major cloud provider
* Scripting and programming skills (e.g., Bash, Python, Go)
* Understanding of networking
* Experience with databases like Oracle DB, MongoDB, PostgreSQL
Nice to Have:
* Experience with GCP, AWS, Azure
* Experience with distributed systems such as caching systems (e.g., Redis), message brokers (e.g., RabbitMQ), log collection systems (e.g., ELK)
What We Offer:
* Autonomy and responsibility: freedom to choose, try, fail, and learn
* Career growth: evaluations every six months to guide your development
* Continuous training: access to courses and industry expert learning opportunities
Location: Reggio Emilia, Italia
#J-18808-Ljbffr