MTS – Site Reliability Engineer
MTS – Site Reliability Engineer
- Location
- Job Number
- City
- Team
- Country
- Discipline
Responsibilities
- Reliability & Availability: Ensure uptime, resiliency, and fault tolerance of AI model training and inference systems.
- Observability: Design and maintain monitoring, alerting, and logging systems to provide real-time visibility into model serving pipelines and infra.
- Performance Optimization: Analyze system performance and scalability, optimize resource utilization (compute, GPU clusters, storage, networking).
- Automation & Tooling: Build automation for deployments, incident response, scaling, and failover in hybrid cloud/on-prem CPU+GPU environments.
- Incident Management: Lead on-call rotations, troubleshoot production issues, conduct blameless postmortems, and drive continuous improvements.
- Security & Compliance: Ensure data privacy, compliance, and secure operations across model training and serving environments.
- Collaboration: Partner with ML engineers and platform teams to improve developer experience and accelerate research-to-production workflows.
Qualifications
Required Qualifications
- 4+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.
Preferred Qualifications
- Strong proficiency in Kubernetes, Docker, and container orchestration.
- Knowledge of CI/CD pipelines for Inference and ML model deployment.
- Hands-on experience with public cloud platforms like Azure/AWS/GCP and infrastructure-as-code.
- Expertise in monitoring & observability tools (Grafana, Datadog, OpenTelemetry, etc.).
- Strong programming/scripting skills in Python, Go, or Bash.
- Solid knowledge of distributed systems, networking, and storage.
- Experience running large-scale GPU clusters for ML/AI workloads (preferred).
- Familiarity with ML training/inference pipelines.
- Experience with high-performance computing (HPC) and workload schedulers ( Kubernetes operators).
- Background in capacity planning & cost optimization for GPU-heavy environments.
- Work on cutting-edge infrastructure that powers the future of Generative AI.
- Collaborate with world-class researchers and engineers.
- Impact millions of users through reliable and responsible AI deployments.
- Competitive compensation, equity options, and comprehensive benefits.
Software Engineering IC4 – The typical base pay range for this role across the U.S. is USD $119,800 – $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 – $258,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
Software Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Software Engineer II
Senior Applied Scientist
Senior Applied Scientist
Principal Applied Science Manager
Principal Applied Science Manager
- Location
- Job Number
- City
- Team
- Country
- Discipline
Do you have a keen interest in applying cutting-edge science to solve real-world problems at scale? Do you thrive in environments where you work with massive datasets and advanced machine learning techniques? Are you excited by the challenge of building intelligent systems that process trillions of records to deliver impactful experiences for millions of users?
At Microsoft AI, we are redefining what’s possible with data and AI. We’re seeking a Principal Applied Science Manager to help design and develop the next generation of big data and AI-driven capabilities.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50- mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week. This expectation is subject to local law and may vary by jurisdiction.
Responsibilities
- Drive strategic impact by identifying and leading high-leverage data science and analytics initiatives across multiple teams
- Drive model inference optimization and system integration to reduce cost, latency and system optimization
- Apply advanced statistical modeling, machine learning, and analytics techniques to tackle complex problems such as fraud/anomaly detection, opportunities and business impact analytics.
- Design E2E solutions and drive projects from concept to production in a fast-paced, dynamic environment.
Qualifications
Required Qualifications:
- Bachelor’s Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 6+ years related experience (e.g., statistics, predictive analytics, research)
- OR Master’s Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics, predictive analytics, research)
- OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research).
- OR equivalent experience.
- 1+ year(s) of people management experience.
Other Requirements:
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
- 3+ years of people management experience.
- 3+ years of experience in statistics, machine learning, including deep learning, NLP, econometrics.
- Experience moving applied research into shipped product features.
- Experience in structure un-scoped problems, define success metrics, and drive execution under uncertainty.
- Analytical mindset with a data-driven approach to problem-solving, consistently upholding high standards of scientific rigor.
- Proven track record of owning and delivering technically challenging projects with measurable impact.
#MicrosoftAI
Applied Sciences M5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Software Engineer II
Senior Applied Scientist
Senior Applied Scientist
We’re building the next-generation Grounding Service that powers the latest AI applications—chat assistants, copilots, and autonomous agents—with factual, cited, and trustworthy responses. Our platform stitches together retrieval, reasoning, and real-time data so that large language models stay anchored to enterprise knowledge, the public web, and proprietary tools.
We’re looking for a Principal Applied Scientist to lead end-to-end science for grounding: inventing retrieval and attribution methods, defining factuality/faithfulness metrics, and shipping production models and APIs that scale to billions of queries. You’ll partner closely with engineering, product, research, and customers to deliver fast, reliable, and explainable answers with source citations across a diverse set of domains and modalities.
As a team, we value curiosity, pragmatic rigor, and inclusive collaboration. We believe great systems emerge when scientists and engineers co-design metrics, models, and infrastructure—and when we obsess over customer impact, privacy, and safety.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50- mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week. This expectation is subject to local law and may vary by jurisdiction.
Responsibilities
- Owns the science roadmap for grounding—including retrieval, re-ranking, attribution, and reasoning—driving initiatives from problem framing to production impact. Designs and evolves state-of-the-art retrieval and RAG orchestration across documents, tables, code, and images.
- Builds citation and provenance systems (e.g., passage highlighting, quote-level alignment, confidence scoring) to reduce hallucinations and increase user trust. Leads experimentation and evaluation using A/B testing, interleaving, NDCG, MRR, precision/recall, and calibration curves to guide measurable trade-offs.
- Advances tool-augmented grounding through schema-aware retrieval, function calling, knowledge graph joins, and real-time connectors to databases, cloud object stores, search indexes, and the web. Partners with platform engineering to productionize models with scalable inference, embedding services, feature stores, caching, and privacy-compliant multi-tenant systems.
- Nurtures collaborative relationships with product and business leaders across Microsoft, influencing strategic decisions and driving business impact through technology. Authors white papers, contributes to internal tools and services, and may publish research to generate intellectual property.
- Bridges the gap between researchers (e.g., Microsoft Research) and development teams, applying long-term research to solve immediate product needs. Leads high-stakes negotiations to ensure cutting-edge technologies are applied practically and effectively.
- Identifies and solves significant business problems using novel, scalable, and data-driven solutions. Shapes the direction of Microsoft and the broader industry through pioneering product and tooling work.
- Mentors applied scientists and data scientists, establishing best practices in experimentation, error analysis, and incident review. Collaborates cross-functionally with PMs, research, infrastructure, and security teams to align on milestones, SLAs, and safety protocols.
- Communicates clearly through design documentation, progress updates, and presentations to executives and customers.
- Contributes to ethics and privacy policies, identifies bias in product development, and proposes mitigation strategies.
Qualifications
Required Qualifications:
- Bachelor’s Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 2+ years related experience (e.g., statistics, predictive analytics, research)
- OR Master’s Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
- OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field
- OR equivalent experience.
- Minimum of 2 years of hands-on experience designing and building search, retrieval, or ranking systems.
- Proven track record of shipping LLM-powered or Retrieval-Augmented Generation (RAG) systems into production environments.
- Solid coding skills and solid foundation in machine learning, with the ability to implement and optimize models effectively.
- Demonstrated ability to lead through ambiguity, make principled trade-offs, and deliver measurable impact in cross-functional, fast-paced settings.
Preferred Qualifications:
- Bachelor’s Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 5+ years related experience (e.g., statistics, predictive analytics, research)
- OR Master’s Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
- OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
- OR equivalent experience.
- Minimum of 4 years of hands-on experience designing and building search, retrieval, or ranking systems.
- Demonstrated expertise in information retrieval, with publications in top-tier conferences or journals such as NeurIPS, ICML, ICLR, SIGIR, or ACL.
- Hands-on experience in large language model (LLM) development, including pretraining, supervised fine-tuning (SFT), and reinforcement learning (RL).
- Proven track record in optimizing LLM inference, or active contributions to open-source frameworks like vLLM, SGLang, or related projects.
#MicrosoftAI
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Software Engineer II
Senior Applied Scientist
Senior Applied Scientist
Technical Program Manager
Technical Program Manager
- Location
- Job Number
- City
- Team
- Country
- Discipline
The Microsoft AI Monetization team plays a pivotal role in shaping how Microsoft generates value from its AI innovations, particularly through products like Copilot, Bing, Edge, and Microsoft Advertising. This cross-functional organization is responsible for driving new monetization opportunities, business growth, and pioneering bold bets in AI-driven experiences.
Join our high-impact Program Management (PGMT) team and help lead execution across Microsoft AI Monetization. Our PGMT’s are embedded in the heart of strategic initiatives that power the ai platform shift, create new canvases for monetization growth, enable publisher platforms, real-time systems, and state of the art personalized models. We operate at the intersection of engineering, product, design, and business—ensuring that complex programs are delivered with a customer focus, strategic clarity, and measurable impact.
As a Principal Technical Program Manager, you will play a crucial role to help manage the core infrastructure and services behind AI monetization at scale. In this role, you will collaborate and drive high-impact, data-intensive projects with deep AI/ML integration, leading critical cross-org initiatives from problem statement to production launch. You’ll apply data driven and customer-oriented focus to understand the needs and help drive realistic customer expectations and also define business, customer, and solution strategy goals, and partner to identify and explore new opportunities. If you thrive in high-scale environments and are passionate about systems that make a measurable impact, we want to hear from you.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50-mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week. This expectation is subject to local law and may vary by jurisdiction.
Responsibilities
- Lead and deliver complex, cross-functional technical programs that drive value for the Microsoft Ads business.
- Set vision and strategy for technical programs, ensuring alignment with business goals and long-term impact.
- Partner with multiple engineering teams to design and implement scalable solutions across multiple feature areas.
- Define success criteria, track project schedules, and ensure alignment across business and technical stakeholders.
- Drive execution of roadmap items, including staging, implementation, and governance for multiple feature groups.
- Identify and resolve project dependencies and blockers, proposing solutions to keep initiatives on track.
- Identify, communicate, and mitigate risks at scale, ensuring program success.
- Apply a metric-driven approach to measure impact and continuously improve processes and outcomes.
- Conduct thorough reviews of data analysis, modeling techniques, and identify new evaluation methods.
- Update internal best practices for data collection and preparation, and contribute to data integrity conversations
- Lead programs across global, distributed teams and manage cross-geo execution.
- Champion a culture of quality, rigor, and truth-seeking—ensuring solutions are robust, scalable, and grounded in data and facts.
Qualifications
Required Qualifications:
- Bachelor’s Degree AND 6+ years experience in engineering, product/technical program management, data analysis, or product development
- OR equivalent experience.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
- Bachelor’s Degree AND 12+ years experience engineering, product/technical program management, data analysis, or product development
- OR equivalent experience.
- 8+ years of experience managing cross-functional and/or cross-team projects.
- 3+ year(s) of experience writing code (e.g., product demos, ad-hoc analysis using SQL, Kusto, Python etc.).
- Experience with high-scale, data-intensive systems, especially in advertising technology (Azure data stack knowledge is a plus or other big data platforms).
- Expertise with AI/ML recommendation systems at scale.
- Proven ability to drive metric-and-impact driven outcomes.
- 3+ years of experience managing cross-functional and/or cross-team projects.
2+ years of data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results).
3+ years of experience building large-scale personalization and recommendation systems for content, advertising, or similar domains.
#MicrosoftAI
Technical Program Management IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Software Engineer II
Senior Applied Scientist
Senior Applied Scientist
Member of Technical Staff – Health AI
Member of Technical Staff – Health AI
- Location
- Job Number
- City
- Team
- Country
- Discipline
Responsibilities
- Deep, full-stack expertise in designing and evaluating AI applications. Evidence of this may include research papers at top AI conferences and journals, open source projects, industry experience in building production AI stacks.
- Strong intuition about pre/post training, metric design for AI, prompt engineering methodologies, and AI systems design.
- Demonstrated experience in one or more of the following areas: prompt engineering, experimental design, language model evaluations, fine tuning, reinforcement learning/direct preference optimization, data curation, and classic machine learning principles.
Qualifications
- Bachelor’s Degree in Computer Science, or related technical discipline AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
- Demonstrated full-stack experience in large-scale AI. Empirical evidence of this in the form of top tier publications, open source contributions, and/or on-the-job work experience.
- Deeper expertise in one or more parts of the AI stack, including prompt engineering, pre-training, fine-tuning, reinforcement learning and direct preference optimization, data curation, LLM inference, orchestration, evaluation pipelines, and deployment.
- Ability to flex across research and engineering boundaries, wearing a bit of both hats.
- Passionate about conversational AI and its deployment.
- Demonstrated written and verbal communication skills with the ability to work closely with cross-functional teams, including product managers, designers, and other engineers.
- Passion for learning new technologies and staying up to date with industry trends, best practices, and emerging technologies in AI.
- Proven ability to collaborate and contribute to a positive, inclusive work environment, fostering knowledge sharing and growth within the team.
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Software Engineer II
Senior Applied Scientist
Senior Applied Scientist
Senior Software Engineer
Senior Software Engineer
- Location
- Job Number
- City
- Team
- Country
- Discipline
With the rapid expansion of digital data and the increasing need to harness it to solve real-world challenges, Microsoft’s Feeds & AI organization is scaling to meet these demands. The Unified Data Platform (UDP) team is seeking talented engineers to help shape the future of intelligent content delivery. If you’re eager to work with cutting-edge AI technologies and make a meaningful impact on the news and content experiences of billions of users across MSN, Windows, and Copilot, this is an exceptional opportunity to join us. As a Senior Software Engineer in the Unified Data Platform (UDP) team, you will design and implement scalable data systems that power our recommendation rankers, enabling personalized experiences for billions of users. You’ll collaborate with data scientists and program manager, to build efficient data pipelines, optimize large-scale data processing, and ensure the reliability of our machine learning infrastructure. This opportunity will allow you to accelerate your technical growth, deepen your expertise in data-driven personalization, and gain hands-on experience with cutting-edge data technologies. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50- mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week. This expectation is subject to local law and may vary by jurisdiction.
Responsibilities
– Partners with appropriate stakeholders to determine user requirements for one or more complex scenarios.
– Provides technical leadership for the identification of dependencies and the development of design documents for a product, application, service, or platform.
– Leads by example and mentors others to produce extensible and maintainable code used across the company. – Leverages deep subject-matter expertise of cross-product features with appropriate stakeholders (e.g., project managers) to lead multiple product’s project plans, release plans, and work items.
– Holds accountability as a Designated Responsible Individual (DRI), mentoring engineers across products/solutions, working on-call to monitor system/product/service for degradation, downtime, or interruptions.
– Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale and shares knowledge with other engineers.
Qualifications
Required Qualifications:
– Bachelor’s Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python – OR equivalent experience.
– 4+ years of expereince in proficient technical design, problem solving and debugging skills is assumed.
Preferred Qualifications:
– Master’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python – OR Bachelor’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
– OR equivalent experience.
Software Engineering IC4 – The typical base pay range for this role across the U.S. is USD $119,800 – $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 – $258,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Software Engineer II
Senior Applied Scientist
Senior Applied Scientist
MTS – Site Reliability Engineer
MTS – Site Reliability Engineer
- Location
- Job Number
- City
- Team
- Country
- Discipline
Responsibilities
- Reliability & Availability: Ensure uptime, resiliency, and fault tolerance of AI model training and inference systems.
- Observability: Design and maintain monitoring, alerting, and logging systems to provide real-time visibility into model serving pipelines and infra.
- Performance Optimization: Analyze system performance and scalability, optimize resource utilization (compute, GPU clusters, storage, networking).
- Automation & Tooling: Build automation for deployments, incident response, scaling, and failover in hybrid cloud/on-prem CPU+GPU environments.
- Incident Management: Lead on-call rotations, troubleshoot production issues, conduct blameless postmortems, and drive continuous improvements.
- Security & Compliance: Ensure data privacy, compliance, and secure operations across model training and serving environments.
- Collaboration: Partner with ML engineers and platform teams to improve developer experience and accelerate research-to-production workflows.
Qualifications
Required Qualifications
- 4+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.
Preferred Qualifications
- Strong proficiency in Kubernetes, Docker, and container orchestration.
- Knowledge of CI/CD pipelines for Inference and ML model deployment.
- Hands-on experience with public cloud platforms like Azure/AWS/GCP and infrastructure-as-code.
- Expertise in monitoring & observability tools (Grafana, Datadog, OpenTelemetry, etc.).
- Strong programming/scripting skills in Python, Go, or Bash.
- Solid knowledge of distributed systems, networking, and storage.
- Experience running large-scale GPU clusters for ML/AI workloads (preferred).
- Familiarity with ML training/inference pipelines.
- Experience with high-performance computing (HPC) and workload schedulers ( Kubernetes operators).
- Background in capacity planning & cost optimization for GPU-heavy environments.
- Work on cutting-edge infrastructure that powers the future of Generative AI.
- Collaborate with world-class researchers and engineers.
- Impact millions of users through reliable and responsible AI deployments.
- Competitive compensation, equity options, and comprehensive benefits.
Software Engineering IC4 – The typical base pay range for this role across the U.S. is USD $119,800 – $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 – $258,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
Software Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Software Engineer II
Senior Applied Scientist
Senior Applied Scientist
Member of Technical Staff, Platform Engineer – Windows Copilot
Member of Technical Staff, Platform Engineer – Windows Copilot
- Location
- Job Number
- City
- Team
- Country
- Discipline
Responsibilities
- Design and development secure and performant Platform services that support Copilot experiences on Windows.
- Work collaboratively with platform, infrastructure, application engineers and researchers to build next generation AI products and services.
- Ship high-quality, well-tested, secure, and maintainable code.
- Overcome obstacles to deliver work quickly and iteratively to users
- Enjoy working in a fast-paced, design-driven, product development cycle.
- Embody our Culture and Values.
Qualifications
- Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
- OR equivalent experience.
- 6+ years’ experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP with extensive use of various datastores like RDBMS, key-value stores, etc.
- 6+ years’ experience building distributed systems at scale and extensive systems knowledge that spans bare-metal hosts (physical server) to containers to networking.
- Experience working with AI platforms, frameworks, and APIs.
- Ability to identify, analyze, and resolve complex technical issues, ensuring optimal performance, scalability, and user experience.
- Dedication to writing clean, maintainable, and well-documented code with a focus on application quality, performance, and security.
- Demonstrated interpersonal skills and ability to work closely with cross-functional teams, including product managers, designers, and other engineers.
- Ability to clearly communicate complex technical concepts to both technical and non-technical stakeholders.
- Ability to work in a fast-paced environment, manage multiple priorities, and adapt to changing requirements and deadlines.
- Proven ability to collaborate and contribute to a positive, inclusive work environment, fostering knowledge sharing and growth within the team.
Software Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Software Engineer II
Senior Applied Scientist
Senior Applied Scientist
Member of Technical Staff – Data Engineering Manager – Microsoft AI – Copilot
Member of Technical Staff – Data Engineering Manager – Microsoft AI – Copilot
- Location
- Job Number
- City
- Team
- Country
- Discipline
As Microsoft continues to push the boundaries of AI, we are on the lookout for individuals to work with us on the most interesting and challenging AI questions of our time. Our vision is bold and broad — to build systems that have true artificial intelligence across agents, applications, services, and infrastructure. It’s also inclusive: we aim to make AI accessible to all — consumers, businesses, developers — so that everyone can realize its benefits.
Microsoft AI (MS AI) is seeking a experienced Member of Technical Staff – Data Engineering Manager – Microsoft AI – Copilot to help build mission critical data pipelines that ingest, process and publishes data streams from our personal AI, Copilot systems. We’re looking for someone who possesses technical prowess, a methodical approach to problem-solving, proficiency in big data processing technologies, and a mastery of templating to architect solutions that stand the test of time and who will bring an abundance of positive energy, empathy, and kindness to the team every day, in addition to being highly effective. The Data Platform Engineering team is responsible for building core data pipelines that help fine tune models, support introspection and retrospection of data so that we can constantly evolve and improve human AI interactions.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction.
Responsibilities
- Build scalable data pipelines for sourcing, transforming and publishing data assets for AI use cases.
- Work collaboratively with other Platform, infrastructure, application engineers as well as AI Researchers to build next generation data platform products and services.
- Ship high-quality, well-tested, secure, and maintainable code.
- Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively.
- Enjoy working in a fast-paced, design-driven, product development cycle.
- Embody our Culture and Values.
- Ability to manage a small team of 4 or 5 senior data engineers, talent management experiences. Additionally the candidate will have experiences owning engineering and operational excellence for data platform which includes PR reviews, code quality, engineering productivity, etc.
Qualifications
Required Qualifications:
- Bachelor’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling or data engineering work
- OR Master’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, or data engineering work
- OR equivalent experience.
- 4+ years technical engineering experience building data processing applications (batch and streaming) with coding in languages including, but not limited to, Python, Java, Spark, SQL.
- Experience working with Apache Hadoop eco system, Kafka, NoSQL, etc.
- 3+ years experience with data governance, data compliance and/or data security.
- 2+ years’ experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP. Extensive use datastores like RDBMS, key-value stores, etc.
- 2+ years’ experience building distributed systems at scale and extensive systems knowledge that spans bare-metal hosts to containers to networking.
- Ability to identify, analyze, and resolve complex technical issues, ensuring optimal performance, scalability, and user experience.
- Dedication to writing clean, maintainable, and well-documented code with a focus on application quality, performance, and security.
- Demonstrated interpersonal skills and ability to work closely with cross-functional teams, including product managers, designers, and other engineers.
- Ability to clearly communicate complex technical concepts to both technical and non-technical stakeholders.
- Interest in learning new technologies and staying up to date with industry trends, best practices, and emerging technologies in web development and AI.
- Ability to work in a fast-paced environment, manage multiple priorities, and adapt to changing requirements and deadlines.
Software Engineering M5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Software Engineer II
Senior Applied Scientist
Senior Applied Scientist
Member of Technical Staff – Data Engineering Manager – Microsoft AI – Copilot
Member of Technical Staff – Data Engineering Manager – Microsoft AI – Copilot
- Location
- Job Number
- City
- Team
- Country
- Discipline
As Microsoft continues to push the boundaries of AI, we are on the lookout for individuals to work with us on the most interesting and challenging AI questions of our time. Our vision is bold and broad — to build systems that have true artificial intelligence across agents, applications, services, and infrastructure. It’s also inclusive: we aim to make AI accessible to all — consumers, businesses, developers — so that everyone can realize its benefits.
Microsoft AI (MS AI) is seeking a experienced Member of Technical Staff – Data Engineering Manager – Microsoft AI – Copilot to help build mission critical data pipelines that ingest, process and publishes data streams from our personal AI, Copilot systems. We’re looking for someone who possesses technical prowess, a methodical approach to problem-solving, proficiency in big data processing technologies, and a mastery of templating to architect solutions that stand the test of time and who will bring an abundance of positive energy, empathy, and kindness to the team every day, in addition to being highly effective. The Data Platform Engineering team is responsible for building core data pipelines that help fine tune models, support introspection and retrospection of data so that we can constantly evolve and improve human AI interactions.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction.
Responsibilities
- Build scalable data pipelines for sourcing, transforming and publishing data assets for AI use cases.
- Work collaboratively with other Platform, infrastructure, application engineers as well as AI Researchers to build next generation data platform products and services.
- Ship high-quality, well-tested, secure, and maintainable code.
- Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively.
- Enjoy working in a fast-paced, design-driven, product development cycle.
- Embody our Culture and Values.
- Ability to manage a small team of 4 or 5 senior data engineers, talent management experiences. Additionally the candidate will have experiences owning engineering and operational excellence for data platform which includes PR reviews, code quality, engineering productivity, etc.
Qualifications
Required Qualifications:
- Bachelor’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling or data engineering work
- OR Master’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, or data engineering work
- OR equivalent experience.
- 4+ years technical engineering experience building data processing applications (batch and streaming) with coding in languages including, but not limited to, Python, Java, Spark, SQL.
- Experience working with Apache Hadoop eco system, Kafka, NoSQL, etc.
- 3+ years experience with data governance, data compliance and/or data security.
- 2+ years’ experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP. Extensive use datastores like RDBMS, key-value stores, etc.
- 2+ years’ experience building distributed systems at scale and extensive systems knowledge that spans bare-metal hosts to containers to networking.
- Ability to identify, analyze, and resolve complex technical issues, ensuring optimal performance, scalability, and user experience.
- Dedication to writing clean, maintainable, and well-documented code with a focus on application quality, performance, and security.
- Demonstrated interpersonal skills and ability to work closely with cross-functional teams, including product managers, designers, and other engineers.
- Ability to clearly communicate complex technical concepts to both technical and non-technical stakeholders.
- Interest in learning new technologies and staying up to date with industry trends, best practices, and emerging technologies in web development and AI.
- Ability to work in a fast-paced environment, manage multiple priorities, and adapt to changing requirements and deadlines.
Software Engineering M5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Similar jobs
Software Engineer II
Senior Applied Scientist
Senior Applied Scientist