-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataform-and-dbt/ #8
Comments
Thank you for your reflective article. After reading it I know now which tool I am gonna choose. Although I am a seasoned GCP developer I will go for dbt. The database in our current application is Azure DWH. And also I prefer jinja over javascript. |
Hi @Ture2019 thanks for the comment, and feedback. I think Snowflake vs BigQuery is possibly a matter or preference, as I've seen reviews favouring both for price and performance in different contexts. I can definitely say that Snowflake is especially good for usability (no maintenance whatsoever), but this is very much in line with BigQuery, and so again I say that it is probably preference or context that dictates which one is a better choice. I work with both almost daily and think they are both excellent! Fivetran is worthwhile considering for the peace of mind, I will only consider alternatives if there is a cost sensitivity or no supported connector, but only as an Extract and Load tool (T for transform is for dbt/dataform). But I have not used Dataflow, so perhaps overlap. Finally - the quote "no one is going to second guess" is maybe a poor phrase that I chose - I mean by that that those are a perfectly acceptable choice, but not strictly the absolute best choice! |
Hey Matt, thanks for the insightful article! I was particularly struck by your description of some of the people getting started with the modern data stack:
There's definitely some set of users that are getting involved with data pipelines and don't want to touch a command line or bother a data engineer, and dbt Cloud + Snowflake + SQL makes that possible. But the moment they want to do anything outside of SQL (such as building a predictive model) they need to set up Airflow/another orchestrator, which is probably not inside the expertise of business analysts or even data scientists. Do you think there are people running into that barrier? Or is no/low code BI most of their focus? |
Hey @rachtsingh thanks for the question! I definitely think this is a huge barrier that many are hitting. I think you've picked up on something very important. The clarity with the architecture of the SQL use cases is so clear, but stepping outside that world leads to a very messy place indeed. This is so neatly reflected by this twitter thread from the team at fal. If you look at that unbundling of Airflow (excluding the dbt+ELT), you can see a huge number of additional tools that are variously trying to fill this gap. I think the simplest answer is that this space is too messy currently for there to be a simple "use this tool" approach, but I do have two relatively clear thoughts:
|
What's the rationale for this statement? Has it been announced that Dataform will become GCP-exclusive or is that implicit as part of their acquirement? |
@james-mead it was announced to customers and probably publicly somewhere. |
Dataform is in private preview for BigQuery 🎆 |
Thanks for sharing this article, it was really interesting and digestible even for me as a Junior Data Engineer who is just starting out and is getting to grips with all the tools and ideas in the data landscape! I'm wondering, if you had a situation where your data was split across Redshift and BigQuery (or any other platforms), would you advise using DBT over Dataflow since it would provide support for both? |
Hey @JustinMyerson no problem thanks for commenting! I would suggest starting with dbt over DataFORM, as it might cater better to your situation, though be aware that it might not help in creating the unified project you are hoping for: |
Just a quick note, you say that it's GCP only, but in the GCP Docs I see this:
It also links to Which in turn informs us:
|
Thanks @RichMarmalite, I specify near the top and again in the TLDR that I'm discussing the cloud versions. Both are open-source core, and hopefully the comments now make that clear. Appreciate the comment 👍
|
Great article, Matt! I've been following the dbt and Dataform evolution very closely as well. I'm strong believer in competition, and do believe it is a good thing to have a dbt alternative out there. I'm curious about Dataform vs dbt traction, and how the Google acquisition will impact the Open Source version of Dataform. I know Dataform closed their Slack community, and it's only limited to @dataform.co and @google.com domains. Is there any other community for people developing Dataform worth joining? |
Thanks @elyisu (and for making a Github account to comment?) I think it is likely seen as a feature for BigQuery and thus requires little cultivation around OS adoption for it to remain viable. Is there any precedent? I think Malloy is an interesting one, an OS query language with a community, on the back of a product aquired by Google. This makes me think it is within policy to allow a community. I don't know of any communities! (The Dataform one is open to existing accounts afaik) |
Hey Matt, thanks a lot for your post, I share your excitement in seeing how both DBT and Dataform will evolve! Would be awesome to see a follow up post maybe a year from now to see what has changed. An additional note I would like to make is unit testing in datapipelines. Nowadays datapipelines can get quite complex and long, which brings the need for proper testing. A big plus for dataform is that this is integrated into the ecosystem: https://docs.dataform.co/guides/tests. For DBT this has been made possible through a package, which I believe is suboptimal (I would rather have this out of the box available in DBT): https://github.com/EqualExperts/dbt-unit-testing Two other things I find very promising about DBT over dataform are the two materializations that DBT offers and dataform does not. By this I mean:
|
Hey @the-serious-programmer , thanks for the comment! Perhaps in a year or two I'll have some useful insights to share. Agree on both of those points, Dataform never really got beyond the basics with materialisations, perhaps they will more tightly align with BQ now that they can focus on it. |
New dbt IDE, and hopefully much more: |
Another one! Thanks Pedram |
SQLMesh looks really cool |
dataform and dbt - Matt Arderne
A quick rundown on two of the “indicative-of-the-future” SQL tools in data analytics at the moment. Dataform and dbt. Welcome to my third post, one I have wanted to write from the beginning. Getting these posts done isn’t easy, and the time between publishing is a commitment that I undertook rather lightly. Like most good ideas, this one is late, irrelevant, and likely only to be marginally useful. That said, here is a quick rundown on two of the “indicative-of-the-future” SQL tools in data analytics at the moment
https://rdrn.dev/dataform-and-dbt/
The text was updated successfully, but these errors were encountered: