Beijing, China

Senior Software Engineer Senior Software Engineer

Location: Beijing, China
Job Number: 200029933-en-1
City: Beijing
Team: Other
Country: China
Discipline: Software Engineering

Overview

The R&D of Search Ads aims to build an online advertising ecosystem of users, advertisers, and the search engine.

Bing Search Ads Understanding team is chartered to deliver world class algorithm using web scale data. Our mission is to drive user satisfaction, advertiser ROI and Bing revenue. A core challenge is to match advertisers’ “Ad display” and users’ “query” by build an intelligent system to really understand the users need. This is a very hard problem that demands the most advanced AI models and sophisticated engineering systems. Join us to work on projects highly strategic to Bing search in a fun and fast-paced environment!

We are hiring a Senior Software Engineer (GPU Inference Optimization) to work on GPU inference optimization of language models to support the GPU serving of the models for Ads tasks including query rewrite, Ad relevance and Ad creative generation, etc. As a member of this team, you will have the opportunity to work on the fundamental abstractions, programming models, runtimes, libraries and APIs to enable large scale inferencing and online serving of models on novel AI hardware.

This is a technical role focused on GPU inference optimization of language models: it requires hands-on software development skills. We’re looking for someone who has a demonstrated history of solving hard technical problems and is motivated to tackle the hardest problems in building a full end-to-end AI stack. An entrepreneurial approach and ability to take initiative and move fast are essential.

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities

Design, develop, and maintain high-performance software in C/C++ and Python, including GPU programming with CUDA, ROCm, or Triton.
Optimize model inference and training pipelines for speed, throughput, memory efficiency, and cost across GPU platforms.
Collaborate with platform teams to integrate and tune solutions on emerging accelerator stacks and rapidly evolving toolchains.
Profile workloads end-to-end, identify bottlenecks, and implement kernel-level and system-level performance improvements.
Partner with internal and external stakeholders to translate requirements into scalable performance features and optimizations for state-of-the-art models.
Validate performance, stability, and correctness through benchmarking, automated testing, and production readiness reviews.

Qualifications

Required Qualifications:

Bachelor’s Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, CUDA, or ROCm
- OR equivalent experience.
3+ years’ practical experience working on applications that use GPUs, experience in optimizing their performance.
Practical Experience writing new GPU kernels, going beyond experience of GPU workloads with existing library kernels.
Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers.

Preferred Qualifications:

Bachelor’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, or ROCm
- OR Master’s Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, or ROCm
- OR equivalent experience.
Experience in low-level performance analysis and optimization, including proficiency using GPU profiling tools such as NVIDIA Visual Profiler, and NVIDIA Nsight Compute.
Technical background and solid foundation in software engineering principles and architecture design.
Familiar with inference optimization, experience in developing popular inference framework such as TensorRT-LLM, SGLang, vLLM.
Exposure to Deep Neural Network inference and experience in one or more deep learning frameworks such as PyTorch, Tensorflow, or ONNX Runtime.

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Explore similar jobs

Explore

Latest

It’s About Time: The Copilot Usage Report 2025

Senior Software Engineer Senior Software Engineer

Similar jobs

Client Solutions Manager Strategic, Global Media Sales, Netherlands – Microsoft Advertising

Sr Account Executive(Advertising)

Principal Software Engineer