👋 I’ll be in dbt Coalesce Conference in Las Vegas Oct 7-10 2024, hit me up to geek out about version control and data developer experience! We’re also hosting the Data Renegade Happy Hour on Oct 8th evening.
I was reviewing my Pocket & reading history and thought these might be interesting to folks.
Why GitHub Actually Won by Scott Chacon, GitHub Co-founder.
TL;DR: Because the developers were building for themselves and had taste, and because of the linux & android ecosystem and rails community created the demand for such use cases.
This totally threw me back in time. git came out a while after I started working on svk, one of the earliest distributed version control system based on subversion1. at the time it definitely felt niche - I wanted to develop in an offline and ergonomic environment.
In addition to taste, I’d also add that Git and GitHub added two core abstraction that worked out in unison. Git introduced the technical abstraction of content addressable merkle tree, an elegant way to represent potentially mergable trees. GitHub introduced the social abstraction of pull-requests. This is what modern CI/CD and DevOps build upon. Without pull requests as an entity, they are just best practices and hard to be productized.
The Analytics Development Lifecycle (ADLC) by
, dbt Labs CEO.TL;DR: we are still early mapping the analytics / data workflow to the traditional SDLC.
Despite a few years of advocacy, the data industry is still slow on adopting “desirable” workflow, and the industry sentiment seemed to be “data teams are struggling to prove ROI”.
While this is a comprehensive whitepaper, I can’t help but think this is somewhat TBU (True-but-useless). What are we doing differently as industry to make the companies relying on data to actually care more about the data workflow?
This is still a good read to get the overall landscape and a starting point to debate why we aren’t there yet. and this is what I think a lot ever since I touched a lot of open government data and the data developer experience has always been brittled. dbt (and similar frameworks like sqlmesh, sdf) pioneered the great improvement by making things code-first, so that we can reuse the GitHub Pull Request social abstraction.
My take on the main difference between the two workflows is “High-confidence correctness is not established yet in Pull Request review time”, so you have either long and/or broken feedback loop with stakeholders. So the way we approach “Correctness” needs to be not just software testing. This is a much longer topic I want to address but the concept is covered in this post. But more interestingly, the software development workflow is also rapidly changing because of AI. It has also become a bit more experiment-driven, so there’s a lot for the next generation software developers to learn from the data practitioners as well.
I think we are on the edge of a new emerging abstraction that is required to power the data/analytics workflow.
The Pricing Roadmap by Ulrik Lehrskov-Schmidt
This is an unusual book detailing pricing methodology combining buyer perspective. It is not just the usual “value-based pricing good, cost-based pricing bad”. You might end up in value-based pricing at the end, but there are thought processes and conscious decisions to be made through validations.
Until next time.
Recently I somehow reconnected with a few Subversion hackers. I still have the committer-only T-shirt “r8810” framed somewhere.