
Job Number
1970324837016562-en-1
Overview
As Microsoft continues to push the boundaries of AI, we are on the lookout for passionate individuals to work with us on the most interesting and challenging AI questions of our time. Our vision is bold and broad — to build systems that have true artificial intelligence across agents, applications, services, and infrastructure. It’s also inclusive: we aim to make AI accessible to all — consumers, businesses, developers — so that everyone can realize its benefits.
Microsoft AI (MS AI) is seeking an experienced Data Scientist to help build the next wave of capabilities of our personal AI, Copilot. We’re looking for someone who thinks deeply about measurement and human-AI interactions—in this role you help us describe and measure how people use Microsoft Copilot. We seek a versatile data scientist who can architect solutions that stand the test of time and who will bring an abundance of positive energy, empathy, and kindness to the team every day, in addition to being highly effective.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
By applying to this U.S. Mountain View, CA OR Redmond, WA position, you are required to be local to the San Francisco area OR Seattle area and in office 3 days a week.
Responsibilities
- Develop and improve evaluation methodologies to assess model output quality, for both machine eval and human eval metrics and coverage.
- Design and implement scalable data pipelines to extract, transform, and structure product logs for evaluation use cases.
- Synthesize datasets for human or machine evaluation.
- Analyze and interpret results from A/B tests, offline benchmarks, and live experiments to drive actionable recommendations.
- Train ML classifiers to analyze and label user logs (e.g., classify intent, detect quality issues) for evaluation
- Draw insights from eval results and form recommendations, drive different eval experiments to find the most optimal solutions.
- Work closely with product managers, engineers, and researchers to define evaluation criteria aligned with product goals and user value.
- Create and maintain dashboards and reporting tools to monitor eval performance and trends.
- Contribute to the development of custom metrics that go beyond standard benchmarks to capture product-specific nuances.
- Stay current on the latest in LLM research on evaluation and prompting.
- Embody our Culture and Values.
Required Qualifications
- Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR Master’s Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 3+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
- OR Bachelor’s Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results)
- OR equivalent experience.
- 5+ years of experience in data science, ML evaluation, or applied research.
- Working knowledge of LLM evaluation methods, including experience conducting both human evaluations and LLM-as-a-judge assessments
- Experience using Python, SQL, and common data analysis libraries for data processing and analysis.
- Ability to analyze complex problems, communicate findings clearly, and translate insights into actionable steps.
Preferred Qualifications
- Experience building or evaluating LLM applications in production.
- Product-driven thinking.
- Ability to work in a fast-paced environment, manage multiple priorities, and adapt to changing requirements and deadlines.
Data Science IC4 – The typical base pay range for this role across the U.S. is USD $119,800 – $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 – $258,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
Microsoft will accept applications and processes offers for these roles on an ongoing basis.
