Hi everyone!
If you have finished among the top-100 participants (according to the leaderboard), please check your email. You will find a link to a form there, which you need to fill out if you want to be on a public leaderboard of the course. Please do it within one week.
If you don't want to be on the public leaderboard, just ignore the email.
If you think you should have received an email but you didn't, please let Alexey know.
If you have finished among the top-100 participants (according to the leaderboard), please check your email. You will find a link to a form there, which you need to fill out if you want to be on a public leaderboard of the course. Please do it within one week.
If you don't want to be on the public leaderboard, just ignore the email.
If you think you should have received an email but you didn't, please let Alexey know.
We will have two interesting workshops pretty soon:
- Identity resolution - we will see how to recognize that two different accounts belong to the same user. This is often a very important task when merging multiple datasets. Sign up here: https://www.eventbrite.com/e/identity-resolution-essentials-from-a-data-scientist-tickets-654866582577
- Mage - a workflow orchestration tool, a nice alternative to Airflow and Prefect. We will see how to set up a simple pipeline with Mage. Sign up here: https://eventbrite.com/e/647017044397
- Identity resolution - we will see how to recognize that two different accounts belong to the same user. This is often a very important task when merging multiple datasets. Sign up here: https://www.eventbrite.com/e/identity-resolution-essentials-from-a-data-scientist-tickets-654866582577
- Mage - a workflow orchestration tool, a nice alternative to Airflow and Prefect. We will see how to set up a simple pipeline with Mage. Sign up here: https://eventbrite.com/e/647017044397
Eventbrite
Identity Resolution Essentials from a Data Scientist
A Deep Dive into Identity Resolution Strategies - Nathan Wang
Here's the public top-100 leaderboard!
https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/cohorts/2023/leaderboard.md
Thanks everyone for taking part in the course
https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/cohorts/2023/leaderboard.md
Thanks everyone for taking part in the course
GitHub
data-engineering-zoomcamp/cohorts/2023/leaderboard.md at main · DataTalksClub/data-engineering-zoomcamp
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering. - DataTalksClub/data-engineering-zoomcamp
We're starting a workshop about Mage, you can join it here or watch it later in recording:
https://www.youtube.com/watch?v=nUfAqM2Sguc
https://www.youtube.com/watch?v=nUfAqM2Sguc
YouTube
Data Plumbing without the 💩 - Tommy Dang
Links:
- Repo: https://github.com/mage-ai/mage_demo_project
- Demo: https://demo.mage.ai/pipelines
- Setup: https://docs.mage.ai/getting-started/setup
- Getting started docs: https://docs.mage.ai/getting-started/setup
- Git integration: https://docs.mag…
- Repo: https://github.com/mage-ai/mage_demo_project
- Demo: https://demo.mage.ai/pipelines
- Setup: https://docs.mage.ai/getting-started/setup
- Getting started docs: https://docs.mage.ai/getting-started/setup
- Git integration: https://docs.mag…
Hey everyone!
We're about to start another workshop about Mage, it might be relevant to some of you
here's a link to the stream: https://www.youtube.com/watch?v=JKALtxziBG0
(As always you can watch it later)
We're about to start another workshop about Mage, it might be relevant to some of you
here's a link to the stream: https://www.youtube.com/watch?v=JKALtxziBG0
(As always you can watch it later)
YouTube
Make Data Magical with Mage - Matt Palmer
Links:
- https://go.mage.ai/dtc-data-magic
Free ML Engineering course: http://mlzoomcamp.com
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
- https://go.mage.ai/dtc-data-magic
Free ML Engineering course: http://mlzoomcamp.com
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Hi everyone!
The next iteration of the course is starting soon (in 1.5 months). In the meantime, you probably have a ton of questions
That's why we organize a Q&A stream on December 18 (Monday) at 17:00 CET where we will answer all your questions
Sign up here: https://lu.ma/1u1jlz4x
Also, on Monday (tomorrow) we will have a workshop that many of you will find relevant. We will talk about using Terraform for setting up a data warehouse (ClickHouse)
Sign up here: https://lu.ma/5fil21de
The next iteration of the course is starting soon (in 1.5 months). In the meantime, you probably have a ton of questions
That's why we organize a Q&A stream on December 18 (Monday) at 17:00 CET where we will answer all your questions
Sign up here: https://lu.ma/1u1jlz4x
Also, on Monday (tomorrow) we will have a workshop that many of you will find relevant. We will talk about using Terraform for setting up a data warehouse (ClickHouse)
Sign up here: https://lu.ma/5fil21de
lu.ma
Introduction to Data Engineering Zoomcamp · Luma
Live session about the upcoming Data Engineering Zoomcamp course - Alexey Grigorev
About the event
Join us for a Q&A session with Alexey Grigorev, the…
About the event
Join us for a Q&A session with Alexey Grigorev, the…
We're starting the workshop about Terraform and ClickHouse
Join now or watch later in replay
https://www.youtube.com/watch?v=YFr_5NTjv0Q
Join now or watch later in replay
https://www.youtube.com/watch?v=YFr_5NTjv0Q
YouTube
Terraform: Reshaping the Data Engineering Experience - Andrei Tserakhau
In this workshop, Andrei Tserakhau, Tech Lead at DoubleCloud, gave a hands-on tutorial about using Terraform for data engineering projects.
He explained how to leverage Terraform to manage and automate data infrastructure, focusing on practical applications…
He explained how to leverage Terraform to manage and automate data infrastructure, focusing on practical applications…
We're starting the Q&A stream in 30 minutes
In the meantime, you can already ask your questions here: https://app.sli.do/event/su9wCLiM9nHnCwtGBfgicX
In the meantime, you can already ask your questions here: https://app.sli.do/event/su9wCLiM9nHnCwtGBfgicX
app.sli.do
Join Slido: Enter #code to vote and ask questions
Participate in a live poll, quiz or Q&A. No login required.
Live stream: https://www.youtube.com/watch?v=91b8u9GmqB4
(Available for replay later)
Ask your questions here: https://app.sli.do/event/su9wCLiM9nHnCwtGBfgicX
(Available for replay later)
Ask your questions here: https://app.sli.do/event/su9wCLiM9nHnCwtGBfgicX
YouTube
Data Engineering Zoomcamp 2024 - Pre-Launch Q&A
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
🔗 CONNECT WITH DataTalksClub
Join the community - https://datatalks-club.slack.com/join/shared_invite/zt-2hu0sjeic-ESN7uHt~aVWc8tD3PefSlA#/shared-invite/email
Subscribe…
🔗 CONNECT WITH DataTalksClub
Join the community - https://datatalks-club.slack.com/join/shared_invite/zt-2hu0sjeic-ESN7uHt~aVWc8tD3PefSlA#/shared-invite/email
Subscribe…
If you're wondering what you should do to prepare the environment for the course, check these two videos
- Using GitHub Codespaces: https://www.youtube.com/watch?v=XOSUt8Ih3zA&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb (Thanks Luis for recording it!)
- Using a GCP VM: https://www.youtube.com/watch?v=ae-CV2KfoN0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb
- Using GitHub Codespaces: https://www.youtube.com/watch?v=XOSUt8Ih3zA&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb (Thanks Luis for recording it!)
- Using a GCP VM: https://www.youtube.com/watch?v=ae-CV2KfoN0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb
YouTube
DE Zoomcamp 1.4.2 - Using Github Codespaces for the Course (by Luis Oliveira)
Timecodes:
00:00 Intro to GitHub Codespaces
1:05 Create a Repo
1:47 Create New Codespace
2:54 Run Codespace Locally/Desktop
3:22 GitHub Codespaces Extension
3:57 Codespaces Overview and Features
5:02 Install Terraform
6:05 Jupyter Notebook
8:54 Running Docker…
00:00 Intro to GitHub Codespaces
1:05 Create a Repo
1:47 Create New Codespace
2:54 Run Codespace Locally/Desktop
3:22 GitHub Codespaces Extension
3:57 Codespaces Overview and Features
5:02 Install Terraform
6:05 Jupyter Notebook
8:54 Running Docker…
Many of you ask how much time you should devote to the course. The answer we have previously given was "it depends".
But actually we collected some data in the past editions of the course that can give a more accurate answer
Here's the dataset: https://github.com/DataTalksClub/zoomcamp-analytics/tree/main/data/de-zoomcamp-2023 (it also contains data from other courses)
You can do some analytics and then share the results with us
In this repo you can also find a notebook from Timur, our past student and teaching assistant, who did the analysis for the first edition of ML Zoomcamp. Half of his notebook is devoted to data cleaning, but actually DE Zoomcamp 2023 data is much cleaner, so most of it is not needed anymore
Have fun!
But actually we collected some data in the past editions of the course that can give a more accurate answer
Here's the dataset: https://github.com/DataTalksClub/zoomcamp-analytics/tree/main/data/de-zoomcamp-2023 (it also contains data from other courses)
You can do some analytics and then share the results with us
In this repo you can also find a notebook from Timur, our past student and teaching assistant, who did the analysis for the first edition of ML Zoomcamp. Half of his notebook is devoted to data cleaning, but actually DE Zoomcamp 2023 data is much cleaner, so most of it is not needed anymore
Have fun!
GitHub
zoomcamp-analytics/data/de-zoomcamp-2023 at main · DataTalksClub/zoomcamp-analytics
Public data and analytics for our open course . Contribute to DataTalksClub/zoomcamp-analytics development by creating an account on GitHub.
We're starting today at 17:00 CET! (In approximately 7 hours from now)
You can ask your questions in advance using this link:
https://app.sli.do/event/su9wCLiM9nHnCwtGBfgicX
See you soon!
You can ask your questions in advance using this link:
https://app.sli.do/event/su9wCLiM9nHnCwtGBfgicX
See you soon!
app.sli.do
Join Slido: Enter #code to vote and ask questions
Participate in a live poll, quiz or Q&A. No login required.
We're starting!
Watch here: https://www.youtube.com/watch?v=AtRhA-NfS24 (or later in replay)
Ask questions here: https://app.sli.do/event/su9wCLiM9nHnCwtGBfgicX
Watch here: https://www.youtube.com/watch?v=AtRhA-NfS24 (or later in replay)
Ask questions here: https://app.sli.do/event/su9wCLiM9nHnCwtGBfgicX
YouTube
Data Engineering Zoomcamp 2024
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
🔗 CONNECT WITH DataTalksClub
Join the community - https://datatalks-club.slack.com/join/shared_invite/zt-2hu0sjeic-ESN7uHt~aVWc8tD3PefSlA#/shared-invite/email
Subscribe…
🔗 CONNECT WITH DataTalksClub
Join the community - https://datatalks-club.slack.com/join/shared_invite/zt-2hu0sjeic-ESN7uHt~aVWc8tD3PefSlA#/shared-invite/email
Subscribe…
We're starting working on module 1 today
Content: https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/01-docker-terraform
Homework: https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/cohorts/2024/01-docker-terraform/homework.md (due in 2 weeks)
We will share the link to the homework form soon
Happy learning!
Content: https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/01-docker-terraform
Homework: https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/cohorts/2024/01-docker-terraform/homework.md (due in 2 weeks)
We will share the link to the homework form soon
Happy learning!
GitHub
data-engineering-zoomcamp/01-docker-terraform at main · DataTalksClub/data-engineering-zoomcamp
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering. - DataTalksClub/data-engineering-zoomcamp
The form for submitting homework 1: https://courses.datatalks.club/de-zoomcamp-2024/homework/hw01
(Please ignore the "This homework is already scored. You didn't submit your answers." part - it's a bug which we'll fix later)
This platform is under active development, but hopefully you won't have any problems.
If you come across a problem, you can report it in the #course-management-platform channel
If you're interested in how it works, the code is here: https://github.com/DataTalksClub/course-management-platform
(Please ignore the "This homework is already scored. You didn't submit your answers." part - it's a bug which we'll fix later)
This platform is under active development, but hopefully you won't have any problems.
If you come across a problem, you can report it in the #course-management-platform channel
If you're interested in how it works, the code is here: https://github.com/DataTalksClub/course-management-platform
GitHub
GitHub - DataTalksClub/course-management-platform: Django-based course management platform for Zoomcamps
Django-based course management platform for Zoomcamps - GitHub - DataTalksClub/course-management-platform: Django-based course management platform for Zoomcamps
How is it going with module 1?
Anonymous Poll
57%
Not started yet
25%
Halfway through
8%
Almost done
7%
Finished everything
4%
Not taking the course
We see that most of the problems you have are actually in the FAQ, so please check it out before posting a message in slack
Here's guidelines for asking for help:
https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/asking-questions.md
If you follow these guidelines, it makes it much easier to help you
Here's guidelines for asking for help:
https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/asking-questions.md
If you follow these guidelines, it makes it much easier to help you
GitHub
data-engineering-zoomcamp/asking-questions.md at main · DataTalksClub/data-engineering-zoomcamp
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering. - DataTalksClub/data-engineering-zoomcamp
Does anyone still have problems signing up to the course management platform to submit the homework?
Even if you haven't finished the homework yet, please try to sign up now, so you don't have any last moment problems.
If you log in with slack and it doesn't work, please try another authentication provider. If it also doesn't work, please write your email to Alexey Grigorev in DM
Here's the link: https://courses.datatalks.club
Even if you haven't finished the homework yet, please try to sign up now, so you don't have any last moment problems.
If you log in with slack and it doesn't work, please try another authentication provider. If it also doesn't work, please write your email to Alexey Grigorev in DM
Here's the link: https://courses.datatalks.club
Today we start module 2 about workflow orchestration and Mage!
Course materials: https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/02-workflow-orchestration
Homework: https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/cohorts/2024/02-workflow-orchestration/homework.md
Form for submitting: https://courses.datatalks.club/de-zoomcamp-2024/homework/hw2
Deadline: February 5th (Monday), 23:00 CET
Also we're extending the deadline for homework 1 for a couple of days. You can submit it till Wednesday 23:00 CET
Course materials: https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/02-workflow-orchestration
Homework: https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/cohorts/2024/02-workflow-orchestration/homework.md
Form for submitting: https://courses.datatalks.club/de-zoomcamp-2024/homework/hw2
Deadline: February 5th (Monday), 23:00 CET
Also we're extending the deadline for homework 1 for a couple of days. You can submit it till Wednesday 23:00 CET
GitHub
data-engineering-zoomcamp/02-workflow-orchestration at main · DataTalksClub/data-engineering-zoomcamp
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering. - DataTalksClub/data-engineering-zoomcamp
"What's homework URL?"
It's a very common question. I know you don't have time to check the FAQ, so citing Michael here:
It's your repository or any other location you have your code where a reasonable person would look at it and think yes, you went through the week and exercises.
By the way, here's the link to the FAQ:
https://docs.google.com/document/d/19bnYs80DwuUimHM65UV3sylsCn2j1vziPOwzBwQrebw/edit?usp=sharing
It's a very common question. I know you don't have time to check the FAQ, so citing Michael here:
It's your repository or any other location you have your code where a reasonable person would look at it and think yes, you went through the week and exercises.
By the way, here's the link to the FAQ:
https://docs.google.com/document/d/19bnYs80DwuUimHM65UV3sylsCn2j1vziPOwzBwQrebw/edit?usp=sharing
Google Docs
Data Engineering Zoomcamp FAQ
Data Engineering Zoomcamp FAQ The purpose of this document is to capture Frequently asked technical questions Editing guidelines: When adding a new FAQ entry, make sure the question is “Heading 2” Feel free to improve if you see something is off Don’t change…