Join to apply for the Senior Engineer – AI Model Compression Research role at Axelera AI
2 weeks ago Be among the first 25 applicants
Join to apply for the Senior Engineer – AI Model Compression Research role at Axelera AI
Company Overview
Axelera is a European, high-growth Series B startup revolutionizing the AI landscape with our in-memory computing platform. We specialize in creating AI hardware and software optimized for high-performance inference across high-end edge computing, embodied AI, and server-side AI deployments. We are seeking passionate, innovative research engineers to join our team and help shape the future of AI.
Role Overview
We are looking for an AI Research Engineer with a strong focus on model compression. This role involves developing cutting-edge techniques to make Generative AI models more efficient for real-time inference, from high-end edge systems to large-scale server deployments. You will ensure our models are optimized for memory, computational efficiency, and performance, without sacrificing accuracy.
This position offers an exciting opportunity to work at the intersection of advanced machine learning, in-memory computing, and high-performance AI inference on state-of-the-art hardware architectures.
Responsibilities:
1. Model Compression: Design and implement advanced techniques such as pruning, quantization, weight sharing, and knowledge distillation to improve memory efficiency and computational performance.
2. Performance Tuning: Optimize compressed models for high-throughput, low-latency inference tailored to our in-memory platform.
3. Collaboration: Work with AI researchers, software, and hardware engineers to integrate model optimizations into our AI platform, ensuring effectiveness across deployments.
4. Innovation: Keep abreast of latest developments in AI and model compression research, pushing the boundaries of model size reduction without performance loss.
5. Deployment & Testing: Implement best practices for model testing, deployment, and continuous improvement to ensure scalable production models.
Requirements:
1. Proven experience in model compression techniques like pruning, quantization, low-rank factorization, and knowledge distillation.
2. Technical Skills: Expertise in deep learning frameworks such as TensorFlow, PyTorch, or JAX; experience optimizing models for resource-constrained environments; familiarity with distributed systems, in-memory computing, or high-performance environments; solid understanding of deep learning algorithms and trade-offs in model compression.
3. Strong knowledge of the latest AI/ML advancements, especially in compression and distillation of generative models (transformers, diffusion models).
4. Excellent collaboration and communication skills, capable of working in a fast-paced startup environment and explaining complex technical concepts clearly.
Preferred Qualifications:
1. PhD or higher in Computer Science, Machine Learning, AI, or related fields.
2. 5+ years of relevant post-graduate work experience.
3. Research experience in model compression, efficient inference, or deploying AI models on resource-limited devices.
4. Familiarity with deployment frameworks like TensorRT, ONNX, etc.
5. Passion for solving real-world AI challenges in dynamic, high-performance environments.
Location:
This role is based in Italy, with relocation support to Bologna, Florence, or Milan for international candidates.
Why Join Us?:
* Impact: Contribute to groundbreaking AI technology powering future applications.
* Culture: Join a diverse, innovative, collaborative team focused on learning and growth.
* Growth: Significant opportunities to influence product and AI strategy as a Series B startup.
* Compensation: Competitive salary, equity options, and benefits.
How to Apply?:
Send your resume and a brief cover letter explaining your enthusiasm and how your experience aligns with our goals in model compression.
At Axelera AI, we value diversity and are committed to creating an inclusive environment for all applicants.
#J-18808-Ljbffr