Skip to main content

Suzhou, China

Senior Firmware Engineering Senior Firmware Engineering

Location
Suzhou, China
Job Number
200002087-en-1
City
Suzhou
Team
Other
Country
China
Discipline
Software Engineering
Overview
We are building the next generation of intelligent consumer devices, combining advanced hardware, connectivity, and AI to create seamless everyday experiences. Our team brings together specialists in acoustics, sensing, and system design, united by the goal of pushing the boundaries of what small, low-power devices can achieve. As a firmware engineer on our team, you will play a key role in enabling real-time intelligence on resource-constrained hardware. Your work will involve designing and optimizing embedded systems, integrating sensors and connectivity modules, and ensuring ultra-low-power performance without compromising user experience. You will collaborate closely with cross-disciplinary experts in hardware, software, and AI to bring innovative ideas from prototype to product. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities

• You’ll ensure wake word recognition is implemented and optimized at the firmware level for responsive on-device AI.

• You’ll develop and maintain firmware support for Bluetooth and Wi-Fi modules, enabling seamless device connectivity.

• You’ll handle real-time processing of audio and visual data, optimizing performance and power efficiency on embedded hardware.

• You’ll help implement high-performance solutions across teams while maintaining a quality checklist with help from other engineers.

• You’ll also monitor telemetry data and perform basic analyses under guidance to triangulate failures.

• You will respond to incidents by identifying the level of impact, troubleshooting basic issues, and deploying appropriate fixes.

• You’ll also develop an understanding of prescriptive guidance for security, privacy, and compliance standards.

• You will share information across disciplines within your feature team.

• You’ll also support your work with others by managing dependencies and actively seeking out essential information.

• You will improve the development and operations of systems, platforms, or product features by actively seeking to develop an understanding of key learnings, insights, and best practices. You’ll do this by participating in design reviews, incident drills and debriefs, and regular meetings.



Qualifications

• Bachelor’s Degree in Electrical Engineering, Computer Engineering, Computer Science, or related field AND 5+ years technical engineering experience

  OR Master’s Degree in Electrical Engineering, Computer Engineering, Computer Science, or related field AND 4+ years technical engineering experience

 OR Doctorate in Electrical Engineering, Computer Engineering, Computer Science, or related field AND 1+ year(s) technical engineering experience o OR equivalent experience.

• Strong experience developing embedded firmware on SoCs from Realtek, BES, Qualcomm, or similar platforms, with deep understanding of low-power and real-time constraints.

• Proven experience in firmware development for wearable devices such as smart glasses or TWS earbuds, including audio, sensor, and connectivity modules.   


This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.




Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Similar jobs

Sr Account Executive(Advertising)

Beijing, China
Advertising Account Management

Principal Software Engineer

Bengaluru, India
Software Engineering

Member of Technical Staff, AI Product, Android Engineer

Mountain View, US
Software Engineering

Beijing, China

Senior Firmware Engineering Senior Firmware Engineering

Location
Beijing, China
Job Number
200002087-en-2
City
Beijing
Team
Other
Country
China
Discipline
Software Engineering
Overview
We are building the next generation of intelligent consumer devices, combining advanced hardware, connectivity, and AI to create seamless everyday experiences. Our team brings together specialists in acoustics, sensing, and system design, united by the goal of pushing the boundaries of what small, low-power devices can achieve. As a firmware engineer on our team, you will play a key role in enabling real-time intelligence on resource-constrained hardware. Your work will involve designing and optimizing embedded systems, integrating sensors and connectivity modules, and ensuring ultra-low-power performance without compromising user experience. You will collaborate closely with cross-disciplinary experts in hardware, software, and AI to bring innovative ideas from prototype to product. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities

• You’ll ensure wake word recognition is implemented and optimized at the firmware level for responsive on-device AI.

• You’ll develop and maintain firmware support for Bluetooth and Wi-Fi modules, enabling seamless device connectivity.

• You’ll handle real-time processing of audio and visual data, optimizing performance and power efficiency on embedded hardware.

• You’ll help implement high-performance solutions across teams while maintaining a quality checklist with help from other engineers.

• You’ll also monitor telemetry data and perform basic analyses under guidance to triangulate failures.

• You will respond to incidents by identifying the level of impact, troubleshooting basic issues, and deploying appropriate fixes.

• You’ll also develop an understanding of prescriptive guidance for security, privacy, and compliance standards.

• You will share information across disciplines within your feature team.

• You’ll also support your work with others by managing dependencies and actively seeking out essential information.

• You will improve the development and operations of systems, platforms, or product features by actively seeking to develop an understanding of key learnings, insights, and best practices. You’ll do this by participating in design reviews, incident drills and debriefs, and regular meetings.



Qualifications

• Bachelor’s Degree in Electrical Engineering, Computer Engineering, Computer Science, or related field AND 5+ years technical engineering experience

  OR Master’s Degree in Electrical Engineering, Computer Engineering, Computer Science, or related field AND 4+ years technical engineering experience

 OR Doctorate in Electrical Engineering, Computer Engineering, Computer Science, or related field AND 1+ year(s) technical engineering experience o OR equivalent experience.

• Strong experience developing embedded firmware on SoCs from Realtek, BES, Qualcomm, or similar platforms, with deep understanding of low-power and real-time constraints.

• Proven experience in firmware development for wearable devices such as smart glasses or TWS earbuds, including audio, sensor, and connectivity modules.   


This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.




Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Similar jobs

Sr Account Executive(Advertising)

Beijing, China
Advertising Account Management

Principal Software Engineer

Bengaluru, India
Software Engineering

Member of Technical Staff, AI Product, Android Engineer

Mountain View, US
Software Engineering

Beijing, China

Senior Software Engineer Senior Software Engineer

Location
Beijing, China
Job Number
200001008-en-1
City
Beijing
Team
Other
Country
China
Discipline
Software Engineering
Overview
We are seeking brilliant and passionate engineers to work with us on the most interesting and challenging problems of AI Infrastructure development. We are a team focusing on large language model optimization, and we are at the forefront of driving innovation in large-scale AI infrastructure. You will be instrumental in designing and implementing the high-performance, massively scalable infrastructure required to deploy frontier LLM models through innovative GPU kernel, compression, scheduling and parallelization optimizations, directly contributing to groundbreaking advancements in the field. This role is not just about maintaining systems; it’s about architecting the future. If you are passionate about AI systems, low-level performance optimization, and solving hard cross-discipline engineering problems, we invite you to join us and help shape the future of AI at Microsoft. This is your opportunity to make a defining impact.

Responsibilities

– Keep up to date with and utilize the latest developments in LLM system optimization.

– Discover/solve impactful technical problems, advance state-of-the-art LLM technologies, and translate ideas into production.

– Optimize LLM inference workloads through innovative kernel, algorithm, scheduling, and parallelization technologies.

– Continuously maintain internal LLM inference infrastructure.



Qualifications

– A bachelor’s degree or higher in computer science, engineering, or a related field, PhD is preferred

– Strong programming skills in Python and C/C++

– 2+ years of experience in machine learning system development and optimization

Preferred Qualifications:

– 2+ years of experience in CUDA kernel development and optimization

– Experience in optimizing communication layer / kernels for deep learning systems

– Experience in machine learning model compression

– Experience on different hardware such as both NVIDIA and AMD GPUs is a plus – A growth mindset and a passion for learning new things 


This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.




Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Similar jobs

Sr Account Executive(Advertising)

Beijing, China
Advertising Account Management

Principal Software Engineer

Bengaluru, India
Software Engineering

Member of Technical Staff, AI Product, Android Engineer

Mountain View, US
Software Engineering

Beijing, China

Principal Software Engineer Principal Software Engineer

Location
Beijing, China
Job Number
200001006-en-1
City
Beijing
Team
Other
Country
China
Discipline
Software Engineering
Overview
As Microsoft continues to push the boundaries of AI, we are seeking brilliant and passionate engineers to work with us on the most interesting and challenging problems of AI Infrastructure development. We are a team focusing on large language model optimization, and we are at the forefront of driving innovation in large-scale AI infrastructure. You will be instrumental in designing and implementing the high-performance, massively scalable infrastructure required to deploy frontier LLM models through innovative GPU kernel, compression, scheduling and parallelization optimizations, directly contributing to groundbreaking advancements in the field. This role is not just about maintaining systems; it’s about architecting the future. If you are passionate about AI systems, low-level performance optimization, and solving hard cross-discipline engineering problems, we invite you to join us and help shape the future of AI at Microsoft. This is your opportunity to make a defining impact.

Responsibilities

– Keep up to date with and utilize the latest developments in LLM system optimization.

– Take the lead in designing innovative system optimization solutions for internal LLM workloads.

– Optimize LLM inference workloads through innovative kernel, algorithm, scheduling, and parallelization technologies.

– Continuously develop and maintain internal LLM inference infrastructure. – Discover new LLM system optimization needs and innovations.



Qualifications

– A bachelor’s degree or higher in computer science, engineering, or a related field, PhD is preferred

– Strong programming skills in Python and C/C++- 5+ years of experience in machine learning system development and optimization

Preferred Qualifications:

– 5+ years of experience in CUDA kernel development and optimization

– Experience in optimizing communication layer / kernels for deep learning systems

– Experience in machine learning model compression – Experience on different hardware such as both NVIDIA and AMD GPUs is a plus

– A growth mindset and a passion for learning new things


This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.




Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Similar jobs

Sr Account Executive(Advertising)

Beijing, China
Advertising Account Management

Principal Software Engineer

Bengaluru, India
Software Engineering

Member of Technical Staff, AI Product, Android Engineer

Mountain View, US
Software Engineering

Mountain View, United States

Member of Technical Staff, Hardware Health – MAI Superintelligence Team Member of Technical Staff, Hardware Health – MAI Superintelligence Team

Location
Mountain View, United States
Job Number
200009249-en-1
City
Mountain View
Team
Microsoft Superintelligence
Country
United States
Discipline
Software Engineering
Overview

Microsoft AI operates one of the world’s most advanced AI training infrastructures, featuring multi-gigawatt clusters spanning tens of thousands of high-performance GPUs, ultra-low-latency NVLink/NVSwitch networks, and innovative liquid-cooling systems. Our team is seeking a Member of Technical Staff, Hardware Health, to ensure these systems deliver sustained reliability, performance, and availability across exascale-class deployments. 

We work closely with research, hardware, datacenter, and platform engineering teams to develop predictive health models, failure detection frameworks, and autonomous remediation systems that keep our AI clusters operating at frontier scale. 

Our newly formed organization, Microsoft AI, is dedicated to advancing Copilot and other consumer AI products and research. The team is responsible for Copilot, Bing, Edge, and generative AI research. Join us and help shape the future of personal computing. 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we embrace a growth mindset, innovate to empower others, and collaborate to achieve shared goals. Every day, we build on our values of respect, integrity, and accountability to foster a culture of inclusion where everyone can thrive at work and beyond. 

Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction. 



Responsibilities
  • Design and develop next-generation hardware health monitoring and diagnostic frameworks for large GPU clusters (NVL16/NVL72/GB200+ scale).
  • Build predictive analytics pipelines leveraging telemetry, power, and thermal data to anticipate hardware degradation and systemic issues.
  • Collaborate with silicon, firmware, and datacenter engineers to identify root causes and remediate large-scale hardware anomalies.
  • Define system health KPIs (e.g., NIS/RIS, MTBF, failure domain analysis) and integrate them into real-time observability platforms.
  • Lead incident triage for high-impact GPU, network, and cooling issues across distributed clusters.
  • Drive automation in health management to reduce manual intervention to the top 5% of anomalies.
  • Partner with cross-functional teams to influence hardware design for reliability, thermal efficiency, and serviceability.


Qualifications

Required Qualifications: 

  • Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.

Preferred Qualifications: 

  • Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python 
    • OR equivalent experience.
  • Experience working with large-scale HPC or GPU systems (NVIDIA H100/GB200 or equivalent).
  • Deep understanding of GPU architecture, high-speed interconnects (NVLink, InfiniBand, RoCE), and large datacenter topologies.
  • Proficiency in hardware telemetry, diagnostics, or failure analysis tools.
  • Experience with exascale-class systems or cloud-scale AI clusters.
  • Familiarity with reliability modeling, machine learning-based anomaly detection, or predictive maintenance.
  • Contributions to large-scale infrastructure operations, supercomputing centers, or AI hardware design. 


Software Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay

Software Engineering IC6 – The typical base pay range for this role across the U.S. is USD $163,000 – $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800 – $331,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay


This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.




Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Similar jobs

Sr Account Executive(Advertising)

Beijing, China
Advertising Account Management

Principal Software Engineer

Bengaluru, India
Software Engineering

Member of Technical Staff, AI Product, Android Engineer

Mountain View, US
Software Engineering

Redmond, United States

Member of Technical Staff, Hardware Health – MAI Superintelligence Team Member of Technical Staff, Hardware Health – MAI Superintelligence Team

Location
Redmond, United States
Job Number
200009249-en-2
City
Redmond
Team
Microsoft Superintelligence
Country
United States
Discipline
Software Engineering
Overview

Microsoft AI operates one of the world’s most advanced AI training infrastructures, featuring multi-gigawatt clusters spanning tens of thousands of high-performance GPUs, ultra-low-latency NVLink/NVSwitch networks, and innovative liquid-cooling systems. Our team is seeking a Member of Technical Staff, Hardware Health, to ensure these systems deliver sustained reliability, performance, and availability across exascale-class deployments. 

We work closely with research, hardware, datacenter, and platform engineering teams to develop predictive health models, failure detection frameworks, and autonomous remediation systems that keep our AI clusters operating at frontier scale. 

Our newly formed organization, Microsoft AI, is dedicated to advancing Copilot and other consumer AI products and research. The team is responsible for Copilot, Bing, Edge, and generative AI research. Join us and help shape the future of personal computing. 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we embrace a growth mindset, innovate to empower others, and collaborate to achieve shared goals. Every day, we build on our values of respect, integrity, and accountability to foster a culture of inclusion where everyone can thrive at work and beyond. 

Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction. 



Responsibilities
  • Design and develop next-generation hardware health monitoring and diagnostic frameworks for large GPU clusters (NVL16/NVL72/GB200+ scale).
  • Build predictive analytics pipelines leveraging telemetry, power, and thermal data to anticipate hardware degradation and systemic issues.
  • Collaborate with silicon, firmware, and datacenter engineers to identify root causes and remediate large-scale hardware anomalies.
  • Define system health KPIs (e.g., NIS/RIS, MTBF, failure domain analysis) and integrate them into real-time observability platforms.
  • Lead incident triage for high-impact GPU, network, and cooling issues across distributed clusters.
  • Drive automation in health management to reduce manual intervention to the top 5% of anomalies.
  • Partner with cross-functional teams to influence hardware design for reliability, thermal efficiency, and serviceability.


Qualifications

Required Qualifications: 

  • Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.

Preferred Qualifications: 

  • Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python 
    • OR equivalent experience.
  • Experience working with large-scale HPC or GPU systems (NVIDIA H100/GB200 or equivalent).
  • Deep understanding of GPU architecture, high-speed interconnects (NVLink, InfiniBand, RoCE), and large datacenter topologies.
  • Proficiency in hardware telemetry, diagnostics, or failure analysis tools.
  • Experience with exascale-class systems or cloud-scale AI clusters.
  • Familiarity with reliability modeling, machine learning-based anomaly detection, or predictive maintenance.
  • Contributions to large-scale infrastructure operations, supercomputing centers, or AI hardware design. 


Software Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay

Software Engineering IC6 – The typical base pay range for this role across the U.S. is USD $163,000 – $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800 – $331,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay


This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.




Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Similar jobs

Sr Account Executive(Advertising)

Beijing, China
Advertising Account Management

Principal Software Engineer

Bengaluru, India
Software Engineering

Member of Technical Staff, AI Product, Android Engineer

Mountain View, US
Software Engineering

New York, United States

Member of Technical Staff, Hardware Health – MAI Superintelligence Team Member of Technical Staff, Hardware Health – MAI Superintelligence Team

Location
New York, United States
Job Number
200009249-en-3
City
New York
Team
Microsoft Superintelligence
Country
United States
Discipline
Software Engineering
Overview

Microsoft AI operates one of the world’s most advanced AI training infrastructures, featuring multi-gigawatt clusters spanning tens of thousands of high-performance GPUs, ultra-low-latency NVLink/NVSwitch networks, and innovative liquid-cooling systems. Our team is seeking a Member of Technical Staff, Hardware Health, to ensure these systems deliver sustained reliability, performance, and availability across exascale-class deployments. 

We work closely with research, hardware, datacenter, and platform engineering teams to develop predictive health models, failure detection frameworks, and autonomous remediation systems that keep our AI clusters operating at frontier scale. 

Our newly formed organization, Microsoft AI, is dedicated to advancing Copilot and other consumer AI products and research. The team is responsible for Copilot, Bing, Edge, and generative AI research. Join us and help shape the future of personal computing. 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we embrace a growth mindset, innovate to empower others, and collaborate to achieve shared goals. Every day, we build on our values of respect, integrity, and accountability to foster a culture of inclusion where everyone can thrive at work and beyond. 

Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction. 



Responsibilities
  • Design and develop next-generation hardware health monitoring and diagnostic frameworks for large GPU clusters (NVL16/NVL72/GB200+ scale).
  • Build predictive analytics pipelines leveraging telemetry, power, and thermal data to anticipate hardware degradation and systemic issues.
  • Collaborate with silicon, firmware, and datacenter engineers to identify root causes and remediate large-scale hardware anomalies.
  • Define system health KPIs (e.g., NIS/RIS, MTBF, failure domain analysis) and integrate them into real-time observability platforms.
  • Lead incident triage for high-impact GPU, network, and cooling issues across distributed clusters.
  • Drive automation in health management to reduce manual intervention to the top 5% of anomalies.
  • Partner with cross-functional teams to influence hardware design for reliability, thermal efficiency, and serviceability.


Qualifications

Required Qualifications: 

  • Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.

Preferred Qualifications: 

  • Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python 
    • OR equivalent experience.
  • Experience working with large-scale HPC or GPU systems (NVIDIA H100/GB200 or equivalent).
  • Deep understanding of GPU architecture, high-speed interconnects (NVLink, InfiniBand, RoCE), and large datacenter topologies.
  • Proficiency in hardware telemetry, diagnostics, or failure analysis tools.
  • Experience with exascale-class systems or cloud-scale AI clusters.
  • Familiarity with reliability modeling, machine learning-based anomaly detection, or predictive maintenance.
  • Contributions to large-scale infrastructure operations, supercomputing centers, or AI hardware design. 


Software Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay

Software Engineering IC6 – The typical base pay range for this role across the U.S. is USD $163,000 – $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800 – $331,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay


This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.




Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Similar jobs

Sr Account Executive(Advertising)

Beijing, China
Advertising Account Management

Principal Software Engineer

Bengaluru, India
Software Engineering

Member of Technical Staff, AI Product, Android Engineer

Mountain View, US
Software Engineering

Redmond, United States

Member of Technical Staff, Evaluations Engineering – MAI Superintelligence Team Member of Technical Staff, Evaluations Engineering – MAI Superintelligence Team

Location
Redmond, United States
Job Number
200009256-en-2
City
Redmond
Team
Microsoft Superintelligence
Country
United States
Discipline
Software Engineering
Overview

Microsoft AI is looking for a Member of Technical Staff, Evaluations Engineer to help build the next wave of capabilities of our personalized AI assistant, Copilot. We’re looking for someone who will bring an abundance of positive energy, empathy, and kindness to the team every day, in addition to being highly effective. The right candidate enjoys building world-class consumer experiences and products in a fast-paced environment. You will actively contribute to the development of AI models that are powering our innovative products. You will wear multiple hats and work on engineering, research, and everything in between. Your contributions will span model architecture, data curation, training and inference infrastructures, evaluation protocols, alignment and reinforcement learning from human feedback (RLHF), and many other exciting topics at the cutting edge of AI.

Microsoft AI is building foundational models to develop novel responsible and efficient artificial general intelligence. Foundational models demand significant compute capacity, as a Member of Technical Staff, Evaluations Engineer, you will design and build the evaluation infrastructure for generative AI on large-scale GPU clusters. This role involves developing sophisticated tools and techniques to ensure the reliability, performance, and health of hundreds of nodes across supercomputers with thousands of GPUs. You will collaborate closely with model scientists to implement state-of-the-art and novel evaluation methods, inference strategies, and metrics algorithms, enabling smooth and efficient execution of evaluation workloads. As a contributing member of the core group of engineers, you would also bring to the table best practices driving architectural changes and influence roadmap of relevant software and hardware components. Your work will directly impact the business goals of a wide range of users and facilitate the next wave of growth and innovation in AI.

Our newly formed organization, Microsoft AI, is dedicated to advancing Copilot and other consumer AI products and research. The team is responsible for Copilot, Bing, Edge, and generative AI research. Come be a part of the team shaping the future personal computing.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction. 



Responsibilities
  • Develop and tune the pretraining scalable software for Nvidia GB200 72NVL CX8 and AMD MIxxx architectures.  

  • Benchmark GB200 and AMD MIxxx GPU clusters.  

  • Gather data and insights to develop the pretraining compute roadmap.  

  • Care deeply about conversational AI and its deployment.  

  • Actively contribute to the development of AI models that are powering our innovative products.  

  • Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively.  

  • Enjoy working in a fast-paced, design-driven, product development cycle.  

  • Embody our Culture and Values.    


Qualifications

Required qualifications

  • Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.

Preferred qualifications

  • Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.
  • Experience with generative AI.
  • Experience with distributed computing.
  • Experience in leading technical projects and supporting architectural decisions with data.   


Software Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay

Software Engineering IC6 – The typical base pay range for this role across the U.S. is USD $163,000 – $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800 – $331,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay


This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.




Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Similar jobs

Sr Account Executive(Advertising)

Beijing, China
Advertising Account Management

Principal Software Engineer

Bengaluru, India
Software Engineering

Member of Technical Staff, AI Product, Android Engineer

Mountain View, US
Software Engineering

New York, United States

Member of Technical Staff, Evaluations Engineering – MAI Superintelligence Team Member of Technical Staff, Evaluations Engineering – MAI Superintelligence Team

Location
New York, United States
Job Number
200009256-en-3
City
New York
Team
Microsoft Superintelligence
Country
United States
Discipline
Software Engineering
Overview

Microsoft AI is looking for a Member of Technical Staff, Evaluations Engineer to help build the next wave of capabilities of our personalized AI assistant, Copilot. We’re looking for someone who will bring an abundance of positive energy, empathy, and kindness to the team every day, in addition to being highly effective. The right candidate enjoys building world-class consumer experiences and products in a fast-paced environment. You will actively contribute to the development of AI models that are powering our innovative products. You will wear multiple hats and work on engineering, research, and everything in between. Your contributions will span model architecture, data curation, training and inference infrastructures, evaluation protocols, alignment and reinforcement learning from human feedback (RLHF), and many other exciting topics at the cutting edge of AI.

Microsoft AI is building foundational models to develop novel responsible and efficient artificial general intelligence. Foundational models demand significant compute capacity, as a Member of Technical Staff, Evaluations Engineer, you will design and build the evaluation infrastructure for generative AI on large-scale GPU clusters. This role involves developing sophisticated tools and techniques to ensure the reliability, performance, and health of hundreds of nodes across supercomputers with thousands of GPUs. You will collaborate closely with model scientists to implement state-of-the-art and novel evaluation methods, inference strategies, and metrics algorithms, enabling smooth and efficient execution of evaluation workloads. As a contributing member of the core group of engineers, you would also bring to the table best practices driving architectural changes and influence roadmap of relevant software and hardware components. Your work will directly impact the business goals of a wide range of users and facilitate the next wave of growth and innovation in AI.

Our newly formed organization, Microsoft AI, is dedicated to advancing Copilot and other consumer AI products and research. The team is responsible for Copilot, Bing, Edge, and generative AI research. Come be a part of the team shaping the future personal computing.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction. 



Responsibilities
  • Develop and tune the pretraining scalable software for Nvidia GB200 72NVL CX8 and AMD MIxxx architectures.  

  • Benchmark GB200 and AMD MIxxx GPU clusters.  

  • Gather data and insights to develop the pretraining compute roadmap.  

  • Care deeply about conversational AI and its deployment.  

  • Actively contribute to the development of AI models that are powering our innovative products.  

  • Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively.  

  • Enjoy working in a fast-paced, design-driven, product development cycle.  

  • Embody our Culture and Values.    


Qualifications

Required qualifications

  • Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.

Preferred qualifications

  • Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.
  • Experience with generative AI.
  • Experience with distributed computing.
  • Experience in leading technical projects and supporting architectural decisions with data.   


Software Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay

Software Engineering IC6 – The typical base pay range for this role across the U.S. is USD $163,000 – $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800 – $331,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay


This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.




Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Similar jobs

Sr Account Executive(Advertising)

Beijing, China
Advertising Account Management

Principal Software Engineer

Bengaluru, India
Software Engineering

Member of Technical Staff, AI Product, Android Engineer

Mountain View, US
Software Engineering

Mountain View, United States

Member of Technical Staff, LLM Inference – MAI Superintelligence Team Member of Technical Staff, LLM Inference – MAI Superintelligence Team

Location
Mountain View, United States
Job Number
200009235-en-1
City
Mountain View
Team
Microsoft Superintelligence
Country
United States
Discipline
Software Engineering
Overview

Our Inference team is responsible for building and maintaining the tools and systems that enable Microsoft AI researchers to run models easily and efficiently. Our work empowers researchers to run models in RL, synthetic data generation, evals, and more. We are joint stewards of one of the largest compute fleets in the world. 

The team is responsible for optimizing compute efficiency on our heterogeneous data centers as well as enabling cutting-edge research and production deployment. We are an applied research team that is embedded directly in Microsoft AI’s research org to work as closely as possible with researchers. We are vertically integrated, owning everything from kernels to architecture co-design to distributed systems to profiling and testing tools.

This role could be a great match for you if you:
  • Understand modern generative AI architectures and how to optimize them for inference.
  • Are familiar with the internals of open-source inference frameworks like vLLM and SGLang.
  • Value clear communication, improving team processes, and being a supportive team player.
  • Are results-oriented, have a bias toward action, and enjoy owning problems end-to-end.
  • Have or can quickly gain familiarity with modern Python and its tooling, PyTorch, Nvidia GPU kernel programming and optimization, Infiniband, and NVLink.

Our newly formed parent organization, Microsoft AI (MAI), is dedicated to advancing Copilot and other consumer AI products and research. The team is responsible for Copilot, Bing, Edge, and AI research.

Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction.



Responsibilities
  • Work alongside researchers and engineers to implement frontier AI research ideas.
  • Introduce new systems, tools, and techniques to improve model inference performance.
  • Build tools to help debug performance bottlenecks, numeric instabilities, and distributed systems issues.
  • Build tools and establish processes to enhance the team’s collective productivity.
  • Find ways to overcome roadblocks and deliver your work to users quickly and iteratively.
  • Enjoy working in a fast-paced, design-driven product development cycle.
  • Embody our Culture and Values.    


Qualifications

Required qualifications

  • Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.

Preferred qualifications

  • Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
      • OR equivalent experience.
  • Experience with generative AI.
  • Experience with distributed computing.
  • Python and Python ecosystem (eg. uv, pybind/nanobind, FastAPI) expertise.
  • Experience with large scale production inference.
  • Experience with GPU kernel programming.
  • Experience benchmarking, profiling, and optimizing PyTorch generative AI models.
  • Experience with open source inference frameworks like vLLM and SGLang.
  • Working experience and conversant with the material in the JAX scaling book. 


Software Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay

Software Engineering IC6 – The typical base pay range for this role across the U.S. is USD $163,000 – $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800 – $331,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay


This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.




Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Similar jobs

Sr Account Executive(Advertising)

Beijing, China
Advertising Account Management

Principal Software Engineer

Bengaluru, India
Software Engineering

Member of Technical Staff, AI Product, Android Engineer

Mountain View, US
Software Engineering