Software development engineer - ai/ml, aws neuron, multimodal inference

Montà

Amazon

Pubblicato il Pubblicato 22h fa

Descrizione

Ph3Overview /h3 pThe Annapurna Labs team at AWS builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon’s custom ML accelerators, Inferentia and Trainium. The AWS Neuron SDK includes an ML compiler, runtime, and application framework that integrates with popular ML frameworks like PyTorch and JAX to enable optimized ML inference and training performance. The Inference Enablement and Acceleration team works across the stack from frameworks to hardware-software boundaries, building infrastructure, innovating methods, and creating high-performance kernels for ML functions to maximize performance on AWS accelerators. The team combines deep hardware knowledge with ML expertise to push the boundaries of AI acceleration and collaborates with customers and open-source ecosystems to enable model enablement and peak performance at scale. As part of the broader Neuron organization, you will architect and implement business-critical features and mentor a team of engineers. We operate in a fast-paced, startup-like culture with no blueprint, where we invent, experiment, and learn. The team collaborates with customers to enable model performance and works with open-source ecosystems for seamless integration. This role is responsible for development, enablement, and performance tuning of a wide variety of LLM model families, including large models like the Llama family. You will work with compiler and runtime engineers to create, build, and tune distributed inference solutions with Trainium and Inferentia, optimizing inference performance across latency and throughput from system level to PyTorch or JAX. Strong Python, system-level programming, and ML knowledge are required. You can learn more about Neuron at the following resources : / / awsdocs-neuron.readthedocs-hosted.com / en / latest / neuron-guide / neuron-cc / index.html / / aws.amazon.com / machine-learning / neuron / / / github.com / aws / aws-neuron-sdk / / / how-silicon-innovation-became-the-secret-sauce-behind-awss-success /p h3Key Responsibilities /h3 ul liDesign, develop, and optimize machine learning models and frameworks for deployment on custom ML hardware accelerators. /li liParticipate in all stages of the ML system development lifecycle including distributed computing-based architecture design, implementation, performance profiling, hardware-specific optimizations, testing, and production deployment. /li liBuild infrastructure to systematically analyze and onboard multiple models with diverse architectures. /li liDesign and implement high-performance kernels and features for ML operations, leveraging the Neuron architecture and programming models. /li liAnalyze and optimize system-level performance across multiple generations of Neuron hardware. /li liConduct detailed performance analysis using profiling tools to identify and resolve bottlenecks. /li liImplement optimizations such as fusion, sharding, tiling, and scheduling. /li liConduct comprehensive testing, including unit and end-to-end model testing with CI/CD pipelines. /li liWork directly with customers to enable and optimize their ML models on AWS accelerators. /li liCollaborate across teams to develop innovative optimization techniques. /li /ul h3A day in the life /h3 pYou will collaborate with a cross-functional team of applied scientists, system engineers, and product managers to deliver state-of-the-art inference capabilities for Generative AI applications. You will debug performance issues, optimize memory usage, and shape the future of Neuron’s inference stack across Amazon and the Open Source Community. You’ll design and code solutions to drive software architecture efficiencies, create metrics, implement automation, and resolve root causes of defects. You will also build high-impact solutions for our large customer base and participate in design discussions, code reviews, and stakeholder communication. You will work cross-functionally to help drive business decisions with technical input in a startup-like development environment. /p h3About the team /h3 pThe Inference Enablement and Acceleration team fosters a builder’s culture with experimentation, impact measurement, collaboration, technical ownership, and continuous learning. We support new members, provide mentorship, and promote knowledge-sharing. We care about career growth and assign projects to develop engineering expertise, empowering team members to take on more complex tasks in the future. /p h3Basic Qualifications /h3 ul li3+ years of non-internship professional software development experience /li liBachelor's degree in computer science or equivalent /li li3+ years of non-internship design or architecture experience (design patterns, reliability, and scaling) of new and existing systems /li liFundamentals of machine learning and LLMs, their architecture, training, and inference lifecycles, with experience optimizing model execution /li liSoftware development experience in C++, Python (proficiency in at least one language) and strong understanding of system performance, memory management, and parallel computing principles /li liProficiency in debugging, profiling, and implementing best software engineering practices in large-scale systems /li /ul h3Preferred Qualifications /h3 ul liFamiliarity with PyTorch, JIT compilation, and AOT tracing /li liFamiliarity with CUDA kernels or equivalent ML / low-level kernels /li liExperience with performant kernel development (e.g., CUTLASS, FlashInfer) would be beneficial /li liFamiliar with syntax and tile-level semantics similar to Triton /li liExperience with online / offline inference serving with vLLM, SGLang, TensorRT or similar platforms in production /li liDeep understanding of computer architecture, OS-level software, and parallel computing /li /ul h3EEO and accommodations statements /h3 pAmazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other protected status. If you require a workplace accommodation during the application or hiring process, please visit https : / / / content / en / how-we-hire / accommodations for more information. If your country / region isn’t listed, please contact your Recruiting Partner. /p h3Compensation /h3 pOur compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $129,300 / year to $223,600 / year, depending on factors including location and experience. Amazon is a total compensation company; equity, sign-on, and other benefits may be provided. For more information, visit https : / / / workplace / employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site. /p h3Posting Information /h3 pPosted : January 29, 2026 (Updated recently) /p pJob ID : J-18808-Ljbffr /p /p #J-18808-Ljbffr

Rispondere all'offerta

Crea una notifica

Salva

Offerta simile

Global purchasing sde - cloud & microservices impact

Montà

Contratto a tempo indeterminato

Amazon

55.000 € all'anno

Offerta simile

Support engineer iii, spft

Montà

Contratto a tempo indeterminato

Amazon

50.000 € all'anno

Offerta simile

Software development graduate (2026, stores), brisbane

Montà

Contratto a tempo indeterminato

Amazon

47.559,2 € all'anno