Member of Technical Staff, Software Co-Design AI HPC Systems – MAI Superintelligence Team
Member of Technical Staff, Software Co-Design AI HPC Systems – MAI Superintelligence Team
- Location
- Job Number
- City
- Team
- Country
- Discipline
Our team’s mission is to architect, co-design, and productionize next-generation AI systems at datacenter scale. We operate at the intersection of models, systems software, networking, storage, and AI hardware, optimizing end-to-end performance, efficiency, reliability, and cost. Our work spans today’s frontier AI workloads and directly shapes the next generation of accelerators, system architectures, and large-scale AI platforms. We pursue this mission through deep hardware–software co-design, combining rigorous systems thinking with hands-on engineering. The team invests heavily in understanding real production workloads large-scale training, inference, and emerging multimodal models and translating those insights into concrete improvements across the stack: from kernels, runtimes, and distributed systems, all the way down to silicon-level trade-offs and datacenter-scale architectures.
This role sits at the boundary between exploration and production. You will work closely with internal infrastructure, hardware, compiler, and product teams, as well as external partners across the hardware and systems ecosystem. Our operating model emphasizes rapid ideation and prototyping, followed by disciplined execution to drive high-leverage ideas into production systems that operate at massive scale.
In addition to delivering real-world impact on large-scale AI platforms, the team actively contributes to the broader research and engineering community. Our work aligns closely with leading communities in ML systems, distributed systems, computer architecture, and high-performance computing, and we regularly publish, prototype, and open-source impactful technologies where appropriate.
About the Team
We build foundational AI infrastructure that enables large-scale training and inference across diverse workloads and rapidly evolving hardware generations. Our work directly shapes how AI systems are designed, deployed, and scaled today and into the future. Engineers on this team operate with end-to-end ownership, deep technical rigor, and a strong bias toward real-world impact.
Microsoft Superintelligence Team
Microsoft Superintelligence team’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.This role is part of Microsoft AI’s Superintelligence Team. The MAIST is a startup-like team inside Microsoft AI, created to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values. Our mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. We aim to deliver breakthroughs that benefit society—advancing science, education, and global well-being.We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models!
Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction.
Responsibilities
Lead the co-design of AI systems across hardware and software boundaries, spanning accelerators, interconnects, memory systems, storage, runtimes, and distributed training/inference frameworks.
Drive architectural decisions by analyzing real workloads, identifying bottlenecks across compute, communication, and data movement, and translating findings into actionable system and hardware requirements.
Co-design and optimize parallelism strategies, execution models, and distributed algorithms to improve scalability, utilization, reliability, and cost efficiency of large-scale AI systems.
Develop and evaluate what-if performance models to project system behavior under future workloads, model architectures, and hardware generations, providing early guidance to hardware and platform roadmaps.
Partner with compiler, kernel, and runtime teams to unlock the full performance of current and next-generation accelerators, including custom kernels, scheduling strategies, and memory optimizations.
Influence and guide AI hardware design at system and silicon levels, including accelerator microarchitecture, interconnect topology, memory hierarchy, and system integration trade-offs.
Lead cross-functional efforts to prototype, validate, and productionize high-impact co-design ideas, working across infrastructure, hardware, and product teams.
Mentor senior engineers and researchers, set technical direction, and raise the overall bar for systems rigor, performance engineering, and co-design thinking across the organization.
Qualifications
Minimum Qualifications
Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or a related technical field, or equivalent practical experience.
10+ years of experience (or equivalent depth) working across systems software, hardware architecture, or AI infrastructure, with demonstrated impact at scale.
Strong background in one or more of the following areas:
AI accelerator or GPU architectures
Distributed systems and large-scale AI training/inference
High-performance computing (HPC) and collective communications
ML systems, runtimes, or compilers
Performance modeling, benchmarking, and systems analysis
Hardware–software co-design for AI workloads
Proficiency in systems-level programming (e.g., C/C++, CUDA, Python) and performance-critical software development.
Proven ability to work across organizational boundaries and influence technical decisions involving multiple stakeholders.
Preferred Qualifications
Experience designing or operating large-scale AI clusters for training or inference.
Deep familiarity with LLMs, multimodal models, or recommendation systems, and their systems-level implications.
Experience with accelerator interconnects and communication stacks (e.g., NCCL, MPI, RDMA, high-speed Ethernet or InfiniBand).
Background in performance modeling and capacity planning for future hardware generations.
Prior experience contributing to or leading hardware roadmaps, silicon bring-up, or platform architecture reviews.
Publications, patents, or open-source contributions in systems, architecture, or ML systems are a plus.
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Member of Technical Staff, AI Product, Android Engineer
Member of Technical Staff, AI Product, Android Engineer
Principal Software Engineer
Member of Technical Staff – Software Engineer (AI infra)- MAI Superintelligence Team
Member of Technical Staff – Software Engineer (AI infra)- MAI Superintelligence Team
- Location
- Job Number
- City
- Team
- Country
- Discipline
Responsibilities
- Develop and tune the pretraining scalable software for Nvidia GB200 72NVL CX8 and AMD MIxxx architectures.
- Benchmark GB200 and AMD MIxxx GPU clusters.
- Gather data and insights to develop the pretraining compute roadmap.
- Care deeply about conversational AI and its deployment.
- Actively contribute to the development of AI models that are powering our innovative products.
- Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively.
- Enjoy working in a fast-paced, design-driven, product development cycle.
- Embody our Cultureand Values.
Qualifications
- Bachelor’s Degree in Computer Science, or related technical discipline AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
- Experience with generative AI.
- Experience with distributed computing.
- Bachelor’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR Master’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
- Experience in leading technical projects and supporting architectural decisions with data.
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Member of Technical Staff, AI Product, Android Engineer
Member of Technical Staff, AI Product, Android Engineer
Principal Software Engineer
Member of Technical Staff, AI Systems Engineer – Microsoft Superintelligence
Member of Technical Staff, AI Systems Engineer – Microsoft Superintelligence
- Location
- Job Number
- City
- Team
- Country
- Discipline
We are building next-generation customized AI silicon designed to accelerate AI workloads with unprecedented efficiency. We are looking for an exceptional Systems Engineer to bridge the gap between our custom hardware and modern AI inference frameworks.
We build foundational AI infrastructure that enables large-scale training and inference across diverse workloads and rapidly evolving hardware generations. Our work directly shapes how AI systems are designed, deployed, and scaled today and into the future. Engineers on this team operate with end-to-end ownership, deep technical rigor, and a strong bias toward real-world impact.
Microsoft Superintelligence team’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.This role is part of Microsoft AI’s Superintelligence Team. The MAIST is a startup-like team inside Microsoft AI, created to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values. Our mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. We aim to deliver breakthroughs that benefit society—advancing science, education, and global well-being.We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models!
The Role
As a Senior AI Systems Engineer, you will own the software integration layer between our custom AI chip’s proprietary SDK and SGLang, a state-of-the-art serving framework for Large Language Models (LLMs) and Vision-Language Models. You will be responsible for ensuring that our silicon can seamlessly run SGLang inference workloads at peak performance, bypassing the traditional CUDA ecosystem entirely.
Responsibilities
- Framework Integration: Architect and develop the backend integration to make our custom AI chip a first-class citizen in SGLang.
- Custom Operator Development: Write custom C++ / PyTorch extensions that map SGLang’s primitive operations (e.g., RadixAttention, FlashAttention, matrix multiplications) to our custom chip’s proprietary software layer.
- Performance Optimization: Profile and optimize end-to-end LLM inference latency, throughput, and memory utilization (Paged Attention) on our hardware.
- Cross-Functional Collaboration: Work closely with our hardware architecture and compiler teams to provide feedback on our custom software stack and silicon design based on framework-level bottlenecks.
- Testing & Deployment: Build robust testing pipelines to validate model accuracy and performance parity against standard GPU baselines.
Qualifications
- BS, MS, or PhD in Computer Science, Computer Engineering, or a related field.
- Software engineering experience focusing on systems programming, ML infrastructure, or AI compilers.
- Expertise in Python: Deep understanding of memory management, concurrent programming.
- Experience with LLM Inference Engines: Hands-on experience modifying or extending frameworks like SGLang, vLLM, DeepSpeed-FastGen, or TensorRT-LLM.
- PyTorch Internals: Strong experience writing PyTorch C++ extensions and custom operators.
- Hardware Interfacing: Proven track record of integrating machine learning workloads with hardware accelerators (GPUs, TPUs, NPUs) using custom SDKs, APIs, or low-level drivers.
- Prior experience working on non-CUDA software ecosystems (e.g., AMD ROCm, AWS Neuron, Google XLA).
- Familiarity with AI compilers and intermediate representations (MLIR, Apache TVM, OpenAI Triton).
- Strong understanding of underlying LLM architectures (Transformers, MoE) and state-of-the-art attention algorithms (FlashAttention v2/v3).
- Previous experience at an AI silicon startup or working on custom accelerators (e.g., Google TPU, AWS Trainium).
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Member of Technical Staff, AI Product, Android Engineer
Member of Technical Staff, AI Product, Android Engineer
Principal Software Engineer
Member of Technical Staff, AI Systems Engineer – Microsoft Superintelligence
Member of Technical Staff, AI Systems Engineer – Microsoft Superintelligence
- Location
- Job Number
- City
- Team
- Country
- Discipline
We are building next-generation customized AI silicon designed to accelerate AI workloads with unprecedented efficiency. We are looking for an exceptional Systems Engineer to bridge the gap between our custom hardware and modern AI inference frameworks.
We build foundational AI infrastructure that enables large-scale training and inference across diverse workloads and rapidly evolving hardware generations. Our work directly shapes how AI systems are designed, deployed, and scaled today and into the future. Engineers on this team operate with end-to-end ownership, deep technical rigor, and a strong bias toward real-world impact.
Microsoft Superintelligence team’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.This role is part of Microsoft AI’s Superintelligence Team. The MAIST is a startup-like team inside Microsoft AI, created to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values. Our mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. We aim to deliver breakthroughs that benefit society—advancing science, education, and global well-being.We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models!
The Role
As a Senior AI Systems Engineer, you will own the software integration layer between our custom AI chip’s proprietary SDK and SGLang, a state-of-the-art serving framework for Large Language Models (LLMs) and Vision-Language Models. You will be responsible for ensuring that our silicon can seamlessly run SGLang inference workloads at peak performance, bypassing the traditional CUDA ecosystem entirely.
Responsibilities
- Framework Integration: Architect and develop the backend integration to make our custom AI chip a first-class citizen in SGLang.
- Custom Operator Development: Write custom C++ / PyTorch extensions that map SGLang’s primitive operations (e.g., RadixAttention, FlashAttention, matrix multiplications) to our custom chip’s proprietary software layer.
- Performance Optimization: Profile and optimize end-to-end LLM inference latency, throughput, and memory utilization (Paged Attention) on our hardware.
- Cross-Functional Collaboration: Work closely with our hardware architecture and compiler teams to provide feedback on our custom software stack and silicon design based on framework-level bottlenecks.
- Testing & Deployment: Build robust testing pipelines to validate model accuracy and performance parity against standard GPU baselines.
Qualifications
- BS, MS, or PhD in Computer Science, Computer Engineering, or a related field.
- Software engineering experience focusing on systems programming, ML infrastructure, or AI compilers.
- Expertise in Python: Deep understanding of memory management, concurrent programming.
- Experience with LLM Inference Engines: Hands-on experience modifying or extending frameworks like SGLang, vLLM, DeepSpeed-FastGen, or TensorRT-LLM.
- PyTorch Internals: Strong experience writing PyTorch C++ extensions and custom operators.
- Hardware Interfacing: Proven track record of integrating machine learning workloads with hardware accelerators (GPUs, TPUs, NPUs) using custom SDKs, APIs, or low-level drivers.
- Prior experience working on non-CUDA software ecosystems (e.g., AMD ROCm, AWS Neuron, Google XLA).
- Familiarity with AI compilers and intermediate representations (MLIR, Apache TVM, OpenAI Triton).
- Strong understanding of underlying LLM architectures (Transformers, MoE) and state-of-the-art attention algorithms (FlashAttention v2/v3).
- Previous experience at an AI silicon startup or working on custom accelerators (e.g., Google TPU, AWS Trainium).
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Member of Technical Staff, AI Product, Android Engineer
Member of Technical Staff, AI Product, Android Engineer
Principal Software Engineer
Member of Technical Staff – Data Scientist
Member of Technical Staff – Data Scientist
- Location
- Job Number
- City
- Team
- Country
- Discipline
We’re looking for data scientists to help build the next generation of post-training methods for frontier models at Microsoft AI. You’ll join a small, high-impact team working across all stages of post-training, with a focus on evaluation design, high-quality training data, and scalable data pipelines for state-of-the-art foundation models.
In this role, you’ll help turn raw model capability into reliable, aligned, and measurable performance improvements, directly shaping how frontier models behave in real-world deployments.
About the Role:
Microsoft AI is building the next generation of frontier models that power Copilot and other large-scale AI experiences. The Post-Training team is responsible for transforming powerful pretrained models into robust, aligned, and high-performing systems used by millions of people worldwide.
Our work focuses on improving general quality, instruction following, coding and math ability, tool use, agentic behaviors, personality, and other critical model capabilities. We operate across the full post-training lifecycle — from data generation and curation, to evaluation and diagnostics, to reward modeling and reinforcement learning.
We are a small, highly autonomous team that works closely with pre-training, product, and engineering partners to rapidly iterate on ideas, run large-scale experiments, and safely advance model capabilities. Each team member owns meaningful parts of the post-training pipeline and has direct access to the compute, data, and decision-making needed to move quickly from insight to production.
Microsoft Superintelligence Team
This role is part of Microsoft AI’s Superintelligence Team. The MAIST is a startup-like team inside Microsoft AI, created to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values. Our mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. We aim to deliver breakthroughs that benefit society—advancing science, education, and global well-being.
We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models!
Responsibilities
Design evaluations of advanced model capabilities and use them to drive rapid, high-signal iteration loops
Work with vendors to produce high quality evaluation and training data
Build data pipelines to produce high quality evaluation and training data
Build data flywheels to hill-climb on model weaknesses, using data from various surfaces where our models are deployed
Ensure optimal quality, quantity and coverage of data across our post-training stages
Run post-training experiments and ablations to produce models that climb our evals
Embody our culture and values.
We’re Looking For People Who:
Have deep experience with LLMs, either training them or applying them in production
Have developed production-scale data pipelines for synthesizing, curating, or processing large quantities of data
Can design, run, and interpret large-scale ML experiments with careful statistical and empirical reasoning.
Possess strong generalist engineering and mathematical skills.
Have clear written and verbal communication, and the ability to collaborate effectively with researchers, engineers and other disciplines.
Bonus skills: Demonstrated SOTA results in any area of large-scale training, inference, or evaluation.
Qualifications
Required skills
Hands‑on experience with large language models, including training or applying them in production (not just prompting)
Designing and running post‑training experiments (evals, ablations, preference tuning / RLHF‑style methods)
Building and owning scalable data pipelines for training and evaluation data
Strong Python skills for ML experimentation, data processing, and analysis
Solid statistical, experimental, and general engineering fundamentals
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Member of Technical Staff, AI Product, Android Engineer
Member of Technical Staff, AI Product, Android Engineer
Principal Software Engineer
Member of Technical Staff, High Performance Computing Engineer – MAI SuperIntelligence Team
Member of Technical Staff, High Performance Computing Engineer – MAI SuperIntelligence Team
- Location
- Job Number
- City
- Team
- Country
- Discipline
Overview
Microsoft AI is looking for experienced Member of Technical Staff, High Performance Computing Engineers to help build and scale the infrastructure that trains our frontier models and powers the next evolution of our personal AI, Copilot. This role offers the unique opportunity to work on some of the largest scale supercomputers in the world – a rare chance to operate at such a significant scale.
Microsoft Superintelligence Team
Microsoft Superintelligence team’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
This role is part of Microsoft AI’s Superintelligence Team. The MAIST is a startup-like team inside Microsoft AI, created to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values. Our mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. We aim to deliver breakthroughs that benefit society—advancing science, education, and global well-being.
We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models!
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
- Design, operate, and maintain large-scale HPC environments, drawing on hands-on engineering experience in production settings.
- Own the deployment, configuration, and day-to-day operation of HPC schedulers (e.g., SLURM, Kubernetes), ensuring reliable and efficient job scheduling at scale.
- Serve as a technical owner for at least one core HPC domain (GPU compute, high-performance storage, networking, or similar), including ongoing maintenance, performance tuning, and troubleshooting of massive clusters.
- Develop and maintain automation and tooling using Bash and/or Python to improve cluster reliability, observability, and operational efficiency.
- Partner closely with researchers and engineers to support their workloads, troubleshoot cluster usage issues, and triage failed or underperforming jobs to resolution.
- Drive work forward independently by navigating ambiguity and technical roadblocks, delivering incremental improvements that get capabilities into users’ hands quickly.
- Enjoy working in a fast-paced, design-driven product development environment, balancing stability with rapid iteration and experimentation.
- Embody our Culture and Values.
Qualifications
Required Qualifications:
- Do you have a Bachelor’s degree in computer science, or related technical field AND 4+ years technical engineering experience with deploying or operating on-premise or cloud high-performance clusters, AND 4+ years experience working with high-scale training clusters (ex. working with frameworks/tools such as nvidia InfiniBand clusters, SLURM, Kubernetes, Ray, etc.), AND 4+ years experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP,
- OR equivalent experience?
Preferred Qualifications:
- Master’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with deploying or operating on-premise or cloud high-performance clusters, AND 6+ years experience working with high-scale training clusters (ex. working with frameworks/tools such as nvidia InfiniBand clusters, SLURM, Kubernetes, Ray, etc.), AND 6+ years experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP,
- OR equivalent experience.
- Experience with LLM training clusters
- Experience working with AI platforms, frameworks, and APIs
- Experience using Machine Learning frameworks, including experience using, deploying, and scaling language learning models, either personally or professionally.
- Experience working with large-scale HPC or GPU systems (ex. NVIDIA H100/GB200 or equivalent).
- Ability to identify, analyze, and resolve complex technical issues, ensuring optimal performance, scalability, and user experience.
- Dedication to writing clean, maintainable, and well-documented code with a focus on application quality, performance, and security.
- Demonstrated interpersonal skills and ability to work closely with cross-functional teams, including product managers, designers, and other engineers.
- Ability to clearly communicate complex technical concepts to both technical and non-technical stakeholders.
- Passion for learning new technologies and staying up to date with industry trends, best practices, and emerging technologies.
- Ability to work in a fast-paced environment, manage multiple priorities, and adapt to changing requirements and deadlines.
- Proven ability to collaborate and contribute to a positive, inclusive work environment, fostering knowledge sharing and growth within the team.
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Member of Technical Staff, AI Product, Android Engineer
Member of Technical Staff, AI Product, Android Engineer
Principal Software Engineer
Technical Program Manager – AI/ML
Technical Program Manager – AI/ML
- Location
- Job Number
- City
- Team
- Country
- Discipline
At Microsoft AI, we are on a mission to train the world’s most capable AI frontier models, pushing the boundaries of scale, performance, and product deployment. We’re tackling some of the most challenging problems in deep learning at scale. As a team, we will deliver one of the best foundation models in the world, forming the foundation of many initiatives across Microsoft AI.
Help deliver one of the best foundational models in the world at Microsoft AI.
Microsoft Superintelligence Team
Microsoft Superintelligence team’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
This role is part of Microsoft AI’s Superintelligence Team. The MAIST is a startup-like team inside Microsoft AI, created to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values. Our mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. We aim to deliver breakthroughs that benefit society—advancing science, education, and global well-being.
We’re looking for highly motivated and detail-oriented Technical Program Managers to help bring our vision to life. We are seeking outstanding individuals excited about contributing to the next generation of systems that will transform the field. We are looking for candidates who:
Deeply understand the pipeline of collecting data, training, evaluating, and serving language models and multimodal models.
Have experience working side-by-side with AI researchers and engineers.
Thrive in a 0->1, scrappy, innovative environment.
Are passionate about managing high-stakes, time-sensitive, large-scale programs.
Take initiative and enjoy finding paths through complexity in a fast-paced environment.
Are comfortable owning projects that span offices, teams, and time zones, can coordinate different workstreams, and drive to relentlessly unblock progress.
Demonstrate a proactive attitude and enthusiasm for exploring new methods and technologies.
Possess strong technical curiosity and judgment.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction.
Responsibilities
Coordinate projects and programs including all elements of end-to-end program planning, timelines, milestones, performance metrics, risk anticipation/mitigation and resource needs for programs and product cycles.
Collaborate with product teams, engineers, researchers, and external partners to identify gaps and drive timelines toward resolution and mitigation.
Leverage data and analytics to identify opportunities for improvement, track progress, and measure the impact of quality and efficiency programs.
Foster a culture of collaboration, continuous improvement, and growth.
Own the status of key projects, proactively identifying risks and proposing solutions to ensure timely delivery.
Communicate program strategies, progress, and results to executive leadership and key stakeholders, advocating for quality and efficiency within the team.
Work closely with teams on infrastructure, data engineering, pre-training, post-training, and product feedback.
Advance the AI frontier responsibly.
Embody Microsoft’s culture and values.
Qualifications
Required Qualifications
- Experience developing AI/ML models — including working with data, training models, and evaluating model quality.
- Experience leading complex technical programs end‑to‑end — planning work, managing timelines, milestones, and risks.
- Experience working closely with engineers, researchers, and product teams to deliver technical outcomes.
- Experience using data and metrics to track progress, spot issues early, and guide decisions.
Microsoft will accept applications and process offers for these roles on an ongoing basis.
#MicrosoftAI #MAI #TechnicalProgramManager #AIJobs #Copilot
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Member of Technical Staff, AI Product, Android Engineer
Member of Technical Staff, AI Product, Android Engineer
Principal Software Engineer
Member of Technical Staff, AI Post-Training – MAI Superintelligence Team Post-Training –
Member of Technical Staff, AI Post-Training – MAI Superintelligence Team Post-Training –
- Location
- Job Number
- City
- Team
- Country
- Discipline
- Are passionate about shipping models into products that users will love
- Will thrive in a highly collaborative, fast-paced environment
- Have a high degree of craftsmanship and pay close attention to details
- Demonstrate a proactive attitude and enthusiasm for exploring new methods and technologies
- Are willing to meaningfully contribute as individuals with multiple responsibilities and can adjust to shifting priorities
Responsibilities
- Develop data collection, evaluation, and finetuning methods for models.
- Design hypotheses and experiment plans for rapidly iterating on model performance.
- Prototype new model features and capabilities and collaborate with engineers and researchers across Microsoft AI to make them a reality.
- Collaborate with pretraining and product platform teams to establish good vertical integration and ship models that Copilot users love.
- Embody our culture and values.
Qualifications
- · Bachelor’s Degree in Computer Science, or related technical discipline AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- Expertise in post-training of AI models
- Demonstrated experience in large-scale AI.
- Passionate about conversational AI and its deployment.
- Demonstrated written and verbal communication skills with the ability to work closely with cross-functional teams, including product managers, designers, and other engineers.
- Passion for learning new technologies and staying up to date with industry trends, best practices, and emerging technologies in AI.
- Proven ability to collaborate and contribute to a positive, inclusive work environment, fostering knowledge sharing and growth within the team.
- Proven research track record in a domain related field supported by exceptional papers
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Member of Technical Staff, AI Product, Android Engineer
Member of Technical Staff, AI Product, Android Engineer
Principal Software Engineer
Member of Technical Staff, Machine Learning – MAI Superintelligence Team
Member of Technical Staff, Machine Learning – MAI Superintelligence Team
- Location
- Job Number
- City
- Team
- Country
- Discipline
As a Member of Technical Staff – Machine Learning (AI Team), you will work to create LLM models for general purpose capabilities and for products. You may be responsible for developing new methods to train core LLM capabilities (including agentive), collecting data, evaluating LLMs, creating data flywheels, tooling for LLM training/evals, writing production quality code, and creating new user-facing features. You should be comfortable creating Reinforcement Learning data, fine tuning, or training classifiers or engineering prompts to support Microsoft products and the Cloud API. We’re looking for someone with experience in machine learning, software engineering, as well as an effective communicator and great teammate. The right candidate takes the initiative, is user-centered and enjoys building world-class AI experiences and products in a fast-paced environment.
Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction.
Microsoft Superintelligence Team
Microsoft Superintelligence team’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
This role is part of Microsoft AI’s Superintelligence Team. The MAIST is a startup-like team inside Microsoft AI, created to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values. Our mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. We aim to deliver breakthroughs that benefit society—advancing science, education, and global well-being.
We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models!
Responsibilities
- Leverage subject matter expertise to improve model quality for interactive and agentive experiences.
- Oversee data acquisition or generation efforts, ensuring that the data meets the model needs.
- Generalize machine learning (ML) solutions into repeatable frameworks.
- Lead evaluation efforts of models, including those deployed within Microsoft products and the Cloud API.
- Track advances in industry and academia, identifies relevant state-of-the-art research, and adapts algorithms and/or techniques to drive innovation and develop new solutions.
- Independently write efficient, readable, extensible code and model pipelines.
- Commit to a customer-oriented focus by acknowledging customer needs and perspectives and building AI products that delight customers.
Qualifications
Required Qualifications:
- Bachelor’s Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Preferred Qualifications:
- Doctorate in Computer Science, Machine Learning, Human-Centered AI or related field AND 2+ year(s) experience (e.g., finetuning models with supervision or reinforcement learning, understanding and fixing data quality and curation, working with collaborators on creating new products).
- OR Master’s Degree in Computer Science, Machine Learning, or related field AND 5+ years experience (e.g., managing structured and unstructured data, developing and debugging models, creating infrastructure for AI-powered products).
- OR Bachelor’s Degree in Computer Science, Mathematics, Machine Learning, Physics, or related field AND 7+ years data-science experience (e.g., managing structured and unstructured data, applying machine learning techniques and driving product direction).
- Demonstrated engineering experience or research experience (e.g. creating or leading the creation of a feature in a different company, complex graduate work, research papers, or other experience).
- 4+ years of data science experience (e.g., managing structured and unstructured data, applying machine learning techniques and driving product direction).
- Experience prompting, evaluating, and working with large language models.
- Experience writing production-quality Python code.
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Member of Technical Staff, AI Product, Android Engineer
Member of Technical Staff, AI Product, Android Engineer
Principal Software Engineer
Member of Technical Staff, Machine Learning – MAI Superintelligence Team
Member of Technical Staff, Machine Learning – MAI Superintelligence Team
- Location
- Job Number
- City
- Team
- Country
- Discipline
As a Member of Technical Staff – Machine Learning (AI Team), you will work to create LLM models for general purpose capabilities and for products. You may be responsible for developing new methods to train core LLM capabilities (including agentive), collecting data, evaluating LLMs, creating data flywheels, tooling for LLM training/evals, writing production quality code, and creating new user-facing features. You should be comfortable creating Reinforcement Learning data, fine tuning, or training classifiers or engineering prompts to support Microsoft products and the Cloud API. We’re looking for someone with experience in machine learning, software engineering, as well as an effective communicator and great teammate. The right candidate takes the initiative, is user-centered and enjoys building world-class AI experiences and products in a fast-paced environment.
Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction.
Microsoft Superintelligence Team
Microsoft Superintelligence team’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
This role is part of Microsoft AI’s Superintelligence Team. The MAIST is a startup-like team inside Microsoft AI, created to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values. Our mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. We aim to deliver breakthroughs that benefit society—advancing science, education, and global well-being.
We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models!
Responsibilities
- Leverage subject matter expertise to improve model quality for interactive and agentive experiences.
- Oversee data acquisition or generation efforts, ensuring that the data meets the model needs.
- Generalize machine learning (ML) solutions into repeatable frameworks.
- Lead evaluation efforts of models, including those deployed within Microsoft products and the Cloud API.
- Track advances in industry and academia, identifies relevant state-of-the-art research, and adapts algorithms and/or techniques to drive innovation and develop new solutions.
- Independently write efficient, readable, extensible code and model pipelines.
- Commit to a customer-oriented focus by acknowledging customer needs and perspectives and building AI products that delight customers.
Qualifications
Required Qualifications:
- Bachelor’s Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Preferred Qualifications:
- Doctorate in Computer Science, Machine Learning, Human-Centered AI or related field AND 2+ year(s) experience (e.g., finetuning models with supervision or reinforcement learning, understanding and fixing data quality and curation, working with collaborators on creating new products).
- OR Master’s Degree in Computer Science, Machine Learning, or related field AND 5+ years experience (e.g., managing structured and unstructured data, developing and debugging models, creating infrastructure for AI-powered products).
- OR Bachelor’s Degree in Computer Science, Mathematics, Machine Learning, Physics, or related field AND 7+ years data-science experience (e.g., managing structured and unstructured data, applying machine learning techniques and driving product direction).
- Demonstrated engineering experience or research experience (e.g. creating or leading the creation of a feature in a different company, complex graduate work, research papers, or other experience).
- 4+ years of data science experience (e.g., managing structured and unstructured data, applying machine learning techniques and driving product direction).
- Experience prompting, evaluating, and working with large language models.
- Experience writing production-quality Python code.
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Member of Technical Staff, AI Product, Android Engineer
Member of Technical Staff, AI Product, Android Engineer
Principal Software Engineer