This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior AI Inference Engineer in Latin America.
In this role, you will lead the design and deployment of advanced AI inference systems for high-profile clients in Media, Entertainment, Gaming, and Sports. You will be responsible for translating complex, ambiguous business problems into robust, real-time AI architectures capable of interpreting and reasoning about video and multi-modal content. Working across the full project lifecycle—from early discovery and pre-sales to architecture, implementation, and optimization—you will partner with technical teams and clients to deliver scalable, high-performance solutions on modern GPU and cloud infrastructure. This position requires hands-on expertise, innovation, and the ability to communicate complex technical concepts clearly to diverse stakeholders.
* Accountabilities
* Architect, implement, and optimize end-to-end AI inference services and agentic pipelines using Python.
* Design autonomous AI agents that can interpret, reason about, and act on video and multi-modal inputs.
* Integrate Vision Language Models (e.g., GPT-4o, Gemini Pro Vision, LLaVA) into production-grade workflows.
* Utilize LLM/agent orchestration frameworks (LangGraph, AutoGen, Semantic Kernel, etc.) to manage complex visual AI tasks.
* Deploy and operate AI services on Kubernetes or similar platforms, ensuring reliability and scalability under heavy workloads.
* Architect distributed systems on AWS, balancing performance, cost, and resilience.
* Optimize workloads for modern NVIDIA GPU architectures (Ampere, Hopper, Blackwell) focusing on real-time, high-throughput media applications.
* Produce clear architecture diagrams and technical documentation for both technical and non-technical audiences.
* Provide technical leadership and guidance to project teams to ensure fidelity to architectural designs and solution goals.
* (Optional) Work with video tooling such as FFmpeg, GStreamer, NVENC/NVDEC, and modern codecs, or deploy AI to edge/hybrid environments.
* * Requirements
* Extensive professional experience designing and shipping AI/ML systems in production, with strong Python expertise.
* Proven track record of taking AI/ML models from prototype to robust, low-latency inference services.
* Hands-on experience building agentic systems, especially with computer vision or multi-modal inputs.
* Familiarity with Vision Language Model integration and orchestration frameworks for multi-modal tasks.
* Strong practical experience with Kubernetes and cloud-native distributed architectures (AWS preferred).
* Knowledge of modern NVIDIA GPU architectures and optimization techniques.
* Product-oriented mindset: able to align technical solutions with business outcomes and ROI.
* Excellent communication skills for collaborating with technical teams, clients, and C-level stakeholders.
* Self-starter, able to work independently in ambiguous or rapidly evolving environments.
* Nice-to-have: experience with FFmpeg, GStreamer, NVENC/NVDEC, OpenShift, NVIDIA Holoscan, Mojo, or AI deployment on edge/hybrid/on-prem environments.
* * Benefits
* Competitive compensation package.
* Fully remote work within North or South America.
* Exposure to high-impact projects with leading global clients in Media, Entertainment, Gaming, and Sports.
* Opportunity to work with cutting-edge AI technologies and modern GPU/cloud infrastructure.
* Professional growth through complex, real-world problem solving.
* Inclusive and diverse work environment.
Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.
When you apply, your profile goes through our AI-powered screening process designed to identify top talent efficiently and fairly.
Our AI evaluates your CV and LinkedIn profile thoroughly, analyzing your skills, experience, and achievements.
It compares your profile to the job's core requirements and past success factors to determine your match score.
Based on this analysis, we automatically shortlist the three candidates with the highest match to the role.
When necessary, our human team may perform an additional manual review to ensure no strong profile is missed.
The process is transparent, skills-based, and free of bias — focusing solely on your fit for the role. Once the shortlist is completed, we share it directly with the company that owns the job opening. The final decision and next steps (such as interviews or additional assessments) are then made by their internal hiring team.
Thank you for your interest
#LI-CL1