
Job Number
1970324837030422-en-1
Overview
If you’ve ever watched a massive data pipeline process billions of records without breaking a sweat, felt genuine satisfaction debugging a business critical schema migration, or gotten excited about shaving milliseconds off pipeline latency then you’re in the right spot. We’re looking for the best data engineers. Data engineers who have built and scaled data systems that others depend on, who take pride in delivering rock-solid data quality, and who genuinely enjoy the craft of data engineering. If you’re the type of person who celebrates when your monitoring dashboards show all green and gets energized by the challenge of making data flow seamlessly across complex systems, this role is for you.
Join us to architect and implement the data backbone that powers Copilot for millions of users worldwide. You’ll own the full data lifecycle – from building lightning-fast ETL pipelines that handle massive scale to crafting experimentation frameworks that drive product decisions. We need someone who thrives on solving complex data challenges, loves collaborating with brilliant teammates, and gets genuinely excited about building infrastructure that just works. In our fast-paced environment, you’ll have the freedom to innovate and the support to build world-class data products that make a real impact.
By applying to this position, you are required to be local to the San Francisco area or Redmond area and in office 3 days a week.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
- Build, maintain, and enhance data ETL pipelines for processing large-scale data with low latency and high throughput to support Copilot operations.
- Design and maintain high throughput, low latency experimentation reporting pipelines that enable data scientists and product teams to measure model performance and user engagement.
- Own data quality initiatives including monitoring, alerting, validation, and remediation processes to ensure data integrity across all downstream systems.
- Implement robust schema management solutions that enable quick and seamless schema evolution without disrupting downstream consumers.
- Develop and maintain data infrastructure that supports real-time and batch processing requirements for machine learning model training and inference.
- Collaborate with ML engineers and data scientists to optimize data access patterns and improve pipeline performance for model evaluation workflows.
- Design scalable data architectures that can handle growing data volumes and evolving business requirements.
- Implement comprehensive monitoring and observability solutions for data pipelines, including SLA tracking and automated alerting.
- Partner with cross-functional teams to understand data requirements and translate them into efficient technical solutions.
Qualifications
Required Qualifications
- Master’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering
- OR Bachelor’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling, or data engineering
- OR equivalent experience.
- Experience building and maintaining production data pipelines at scale using technologies such as Apache Spark, Kafka, or similar distributed processing frameworks.
- Experience writing production-quality Python, Scala, or Java code for data processing applications.
- Experience building and scaling experimentation frameworks.
- Experience with cloud data platforms (Azure, AWS, or GCP) and their data services.
- Experience with schema management and data governance practices.
Preferred Qualifications
- Experience with real-time data processing and streaming architectures.
- Experience with data orchestration frameworks such as Airflow, Prefect, Dagster or similar workflow management systems.
- Experience with containerization technologies (Docker, Kubernetes) for data pipeline deployment.
- Demonstrated experience with data quality frameworks and monitoring solutions.
Data Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.
Data Engineering IC6 – The typical base pay range for this role across the U.S. is USD $163,000 – $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800 – $331,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
Microsoft will accept applications and processes offers for these roles on an ongoing basis.
