Engineering Job: Senior Software Engineer, Profiling Services

Senior Software Engineer, Profiling Services

Santa Clara, California, United States

Category: Software Engineering

Description

Are you ready to innovate GPU performance analysis for Machine Learning workloads?! Join our Developer Tools Always-On Profiling (AON) team as a Senior Software Architect, where you'll be pivotal in designing, implementing, and leading our Always-On Profiling service. This role demands deep technical expertise, a proven track record to solve ambiguous challenges, and strong technical leadership skills.

What you’ll be doing:

Architect and Build Scalable Systems: Drive the design and implementation of the AON profiling service's core systems. You'll master inter-process communication (IPC), memory management, and building low-overhead architectures to handle profiling data from complex multi-node, multi-process, multi-GPU, and cluster environments.
Elevate Software Engineering Excellence: Promote high standards in software development, including design patterns, concurrency, parallelism, and advanced debugging for asynchronous systems. Our commitment to code quality and robust testing ensures a reliable profiling service.
Lead, Mentor, and Innovate: Guide and mentor engineers, provides impactful code reviews, and shape technical roadmaps. Proactively identify complex technical issues within the AON project, break them down, and craft innovative solutions. Your problem-solving prowess will be crucial for AON's success with ML workloads.
Architect and Build High-Performance Platforms: Transform user needs into clear requirements and design documents. Explore diverse approaches to problems, making well-reasoned recommendations. Lead end-to-end feature development—from planning and prototyping to implementation, testing, and customer evaluation. This involves hands-on development across user applications, drivers, performance counter libraries, and lower-level platform/hardware abstraction layers.
Collaborate Across Boundaries: Partner effectively with diverse internal and external teams. Exceptional communication and collaboration skills are key to integrating AON seamlessly into the broader profiling and ML ecosystem.

What we need to see:

BS or MS degree or equivalent experience in Computer Engineering, Computer Science, or related degree.
8+ years of meaningful software development experience in C, C++, and Python
10+ years in system software design, operating systems fundamentals, computer architectures, performance analysis, and delivering production-quality software.
Strong interpersonal, verbal, and written communication, demonstrating the ability to build cross-organizational partnerships and lead technical teams through complex challenges.
Profiling & Performance Tools Expert: Extensive knowledge of profiling technologies (sampling, tracing), overhead analysis, and diverse profiling data (CPU/GPU events, performance counters, API traces, event correlation). Familiarity with existing profiling ecosystems and their limitations is a plus.
GPU & CUDA Proficiency: In-depth knowledge of CUDA APIs, runtime, streams, kernels, and GPU architecture.
ML Ecosystem & Performance Analysis: Familiarity with ML frameworks such as PyTorch and JAX, and knowledge of performance analysis for AI training/inference applications.
Large-Scale System Development & Debugging: Experience developing and debugging across complex multi-layered software systems, including user mode and kernel drivers, with a proven ability to contribute to and extend substantial codebases (100s of millions of lines).
Proficiency in Designing APIs and Interfaces for Profiling Tools: Designs robust, flexible APIs and interfaces enabling seamless integration of profiling tools with various frameworks and custom code.
Mastery of Problem Simplification: A history of breaking down ill-defined problems in complex technical domains, designing effective solutions, and leading teams to implement them.

Ways to stand out from the crowd:

Pioneering Low-Overhead Profiling Systems: A track record of designing and implementing profiling systems with minimal performance impact on target workloads, especially in complex multi-process and distributed environments.
Deep Understanding of PyTorch Internals & CUDA Usage: A comprehensive grasp of how PyTorch uses CUDA, including tensor memory, operations, and distributed training functionalities.
GPU Performance Analysis & Optimization Acuity: The ability to analyze profiling data and translate it into concrete, actionable insights, particularly within CUDA and ML Frameworks like PyTorch.
Translating Customer Needs: Skilled at redefining customer requests into actionable use cases and requirements.
Strong understanding of system security principles.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Company

NVIDIA

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Simmilar jobs

Software Validation Engineer

Software Validation Engineer

We create smart innovations to meet the mobility challenges of today and tomorrow. We design and manufacture a complete range of transportation...

We create smart innovations to meet the mobility challenges of today and tomorrow. We...

Alstom December 18, 2025
Graduate Software Engineer

Graduate Software Engineer

Job title: Graduate Digital Intelligence Software Engineer Location: Leeds We offer a range of hybrid and flexible working arrangements,...

Job title: Graduate Digital Intelligence Software Engineer Location: Leeds We...

BAE Systems December 18, 2025
Senior Software Engineer, AI Storage Infrastructure

Senior Software Engineer, AI Storage...

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized...

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market,...

NVIDIA December 18, 2025
Senior Full-Stack Software Engineer

Senior Full-Stack Software Engineer

We are looking for a Senior Full-Stack Software Engineer to join our Metropolis Blueprint team. NVIDIA Metropolis is a vision AI application platform...

We are looking for a Senior Full-Stack Software Engineer to join our Metropolis Blueprint...

NVIDIA December 18, 2025
Sr. Software Engineer, Factory Software

Sr. Software Engineer, Factory Software

What to Expect Tesla’s Factory Software department is currently seeking a Senior Software Engineer to focus on improving our in-house...

What to Expect Tesla’s Factory Software department is currently seeking a Senior...

Tesla December 18, 2025

More info:

link

Senior Software Engineer, Profiling Services

Category: Software Engineering

Description

Company

NVIDIA

Simmilar jobs

Software Validation Engineer

Software Validation Engineer

We create smart innovations to meet the mobility challenges of today and tomorrow. We design and manufacture a complete range of transportation...

We create smart innovations to meet the mobility challenges of today and tomorrow. We...

Alstom December 18, 2025

Graduate Software Engineer

Graduate Software Engineer

Job title: Graduate Digital Intelligence Software Engineer Location: Leeds We offer a range of hybrid and flexible working arrangements,...

Job title: Graduate Digital Intelligence Software Engineer Location: Leeds We...

BAE Systems December 18, 2025

Senior Software Engineer, AI Storage Infrastructure

Senior Software Engineer, AI Storage...

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized...

NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market,...

NVIDIA December 18, 2025

Senior Full-Stack Software Engineer

Senior Full-Stack Software Engineer

We are looking for a Senior Full-Stack Software Engineer to join our Metropolis Blueprint team. NVIDIA Metropolis is a vision AI application platform...

We are looking for a Senior Full-Stack Software Engineer to join our Metropolis Blueprint...

NVIDIA December 18, 2025

Sr. Software Engineer, Factory Software

Sr. Software Engineer, Factory Software

What to Expect Tesla’s Factory Software department is currently seeking a Senior Software Engineer to focus on improving our in-house...

What to Expect Tesla’s Factory Software department is currently seeking a Senior...

Tesla December 18, 2025

Follow Us

More info:

Santa Clara, California, United States

News

Events

Jobs