Unlocking Data Productivity: Lessons from Nvidia's Jensen Huang
and why we are not Running with Data
Last week, Nvidia briefly joined the 1-trillion market cap club. CEO Jensen Huang shared his stories at NTU commencement, on accepting failures, perseverance, and strategic retreat in order to focus. He advised all to “Run, don’t walk”, for “Either you're running for food or you are running from becoming food.”
Nvidia's early success was largely attributed to their laser-focused approach on desktop 3D, which led to their market dominance. The secret? Velocity. This is as crucial in hardware as it is in data projects.
Run, not Walk
Conventional companies walked, with 18-month cycles to design and ship chips. Nvidia ran. They had three teams operate in overlapping 18-month cycles, enabling a new product launch every six months. This involved conducting testing, validation, and integration work early in the development cycle, specifically by:
Heavily investing in simulation tooling to reduce post-production debugging and costly errors.
Creating unified video drivers early in the development process, even before the chips were made.
This strategy propelled Nvidia to dominate the desktop 3D market and venture into general accelerated computing and the mobile chips business1.
Shifting Left
This approach, known as “Shift Left” in software development, provides a tight feedback loop early in development cycles, greatly reducing risks in production. It is now standard in software development. Yet, it’s difficult to achieve in hardware. Huang, however, has proven it's possible and succeeded with Nvidia’s chip business.
At Computex, Huang showcased Nvidia’s Omniverse, a platform for collaborative design of factories with full simulations. This allows for virtual testing and iteration before construction, helping the manufacturing sector shift left and prevent costly errors in the real world.
This principle of encouraging collaboration and speedy iteration is vital in software, hardware, and being attempted for manufacturing. But data, what now power the majority of businesses, is still untapped.
The Rise and Evolution of Data Engineering
Exploring why shifting left hasn't become common in the data part of modern software warrants its own discussion. The key is in the data engineering role that evolved through the past decades.
But why is “adopting software engineering practices in data” still brought up? Aren’t data engineers “software engineers specialized in data”? There are several challenges:
Not all Data Engineers come from software engineering backgrounds. In smaller organizations, analysts or scientists might take up the role.
Teams are often stuck in “Just trying to get it working mode” scrambling to get the plumbing right with ad-hoc scripts or generic orchestration tool.
Data pipelines are becoming more semantic. A logical change of data can go through a combination of tools, making it hard to trace where and what went wrong.
Unique attributes in data projects, like scale and lack of feedback loop, make software engineering tools not directly applicable.
Data Productivity Tools
To address these challenges, the modern data stack is shifting towards code and abstractions. Tools like dbt and dagster are shaping data infra as code, allowing for a more rigorous development and code review process. Data catalog tools are being reinvented to accommodate more types of metadata. Monitoring solutions like Monte Carlo simplify data reliability in production.
This is a pivotal moment for the data stack and tooling to shift left. However, develop-time productivity is still under-invested. Feedback loop across the complex data stack are still slow, and often with surprises and fire fighting.
PipeRider: Shift-Left Data Reliability
This is why we’ve been working on the open source data reliability toolkit PipeRider. It is designed to work seamlessly with dbt, augmenting the pull-request process with impact analysis. It answers the important question of "what happens to your data when you merge this change", in the context of the downstream dashboard or ML models.
Our aim is to boost data productivity by enabling shift-left practices in data engineering. In doing so, we hope to help businesses unlock the true potential of their data, and empower them to run, not walk, in the data-driven age.
As LLM and foundation models proliferate and the industry shifts towards data-centric AI, we believe the confident and effective loop in iterating data is more important than ever. We are also helping a few design partners to explore use cases in this area.
Final Thoughts
With every business now collecting and processing more data, the need for tooling to handle this complexity is increasingly critical. Data powers machine learning models and automated systems that directly impact customer experiences.
As the data stack swing moves between democratization and central unified experience, we don’t know exactly what the right tool will look like at the end. But we've got to run.
Echoing Huang's advice: "Run for food, or run to avoid becoming food", we must move swiftly but cautiously. It's crucial to equip ourselves with the right tools for efficient iteration and collaboration. Yes, mistakes will be made, but the ability to accept these failures, learn, and course-correct quickly is also a kind of meta-running - a running that allows us to sprint without stumbling. After all, if you can move fast with confidence, you get a lot more chances to win.
If this resonates with you, feel free to reach out and share about your data engineering experience and challenges. I’d also love to hear if you’ve got any experience and stories that’s like shift-left in other industries!
Thanks to Hanley Weng, Jos Boumans for reading and providing feedback to the early draft.
The book 'Good Strategy & Bad Strategy' details the stories here, and ends at the crossroads between mobile chips (Tegra) and general-purpose GPU (Tesla). Nvidia chose the latter path, as articulated in Huang's commencement speech.