PpJob Requisition ID: JR /p pJob Category: Engineering /p pTime Type: Tempo pieno /p pWe are seeking highly skilled and motivated software engineers to join us and build AI inference systems that serve large-scale models with extreme efficiency. You’ll architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive industry benchmarks, and scale workloads across multi-GPU, multi-node, and multi-cloud environments. You’ll collaborate across inference, compiler, scheduling, and performance teams to push the frontier of accelerated computing for AI. /p h3What You’ll Be Doing /h3 ul liContribute features to vLLM that empower the newest models with the latest NVIDIA GPU hardware features; profile and optimize the inference framework (vLLM) with methods like speculative decoding, data/tensor/expert/pipeline-parallelism, prefill-decode disaggregation. /li liDevelop, optimize, and benchmark GPU kernels (hand-tuned and compiler-generated) using techniques such as fusion, autotuning, and memory/layout optimization; build and extend high-level DSLs and compiler infrastructure to boost kernel developer productivity while approaching peak hardware utilization. /li liDefine and build inference benchmarking methodologies and tools; contribute both new benchmark and NVIDIA’s submissions to the industry-leading MLPerf Inference benchmarking suite. /li liArchitect the scheduling and orchestration of containerized large-scale inference deployments on GPU clusters across clouds. /li liConduct and publish original research that pushes the pareto frontier for the field of ML Systems; survey recent publications and find a way to integrate research ideas and prototypes into NVIDIA’s software products. /li /ul h3What We Need To See /h3 ul liBachelor’s degree (or equivalent experience) in Computer Science (CS), Computer Engineering (CE) or Software Engineering (SE) with 7+ years of experience; alternatively, Master’s degree in CS/CE/SE with 5+ years of experience; or PhD degree with the thesis and top-tier publications in ML Systems, GPU architecture, or high-performance computing. /li liStrong programming skills in Python and C/C++; experience with Go or Rust is a plus; solid CS fundamentals: algorithms data structures, operating systems, computer architecture, parallel programming, distributed systems, deep learning theories. /li liKnowledgeable and passionate about performance engineering in ML frameworks (e.g., PyTorch) and inference engines (e.g., vLLM and SGLang). /li liFamiliarity with GPU programming and performance: CUDA, memory hierarchy, streams, NCCL; proficiency with profiling/debug tools (e.g., Nsight Systems/Compute). /li liExperience with containers and orchestration (Docker, Kubernetes, Slurm); familiarity with Linux namespaces and cgroups. /li liExcellent debugging, problem-solving, and communication skills; ability to excel in a fast-paced, multi-functional setting. /li /ul h3Ways to Stand Out from the Crowd /h3 ul liExperience building and optimizing LLM inference engines (e.g., vLLM, SGLang). /li liHands‑on work with ML compilers and DSLs (e.g., Triton, TorchDynamo/Inductor, MLIR/LLVM, XLA), GPU libraries (e.g., CUTLASS) and features (e.g., CUDA Graph, Tensor Cores). /li liExperience contributing to containerization/virtualization technologies such as containerd/CRI‑O/CRIU. /li liExperience with cloud platforms (AWS/GCP/Azure), infrastructure as code, CI/CD, and production observability. /li liContributions to open‑source projects and/or publications; please include links to GitHub pull requests, published papers and artifacts. /li /ul pYour base salary will be determined based on your location, experience, and the pay of employees in similar positions. For Poland: The base salary range is 292,500 PLN - 507,000 PLN for Level 4, and 375,000 PLN - 650,000 PLN for Level 5. /p /p #J-18808-Ljbffr