Data Engineering Zoomcamp
25.1K subscribers
4 photos
119 links
Download Telegram
Hi everyone!

If you have finished among the top-100 participants (according to the leaderboard), please check your email. You will find a link to a form there, which you need to fill out if you want to be on a public leaderboard of the course. Please do it within one week.

If you don't want to be on the public leaderboard, just ignore the email.

If you think you should have received an email but you didn't, please let Alexey know.
We will have two interesting workshops pretty soon:

- Identity resolution - we will see how to recognize that two different accounts belong to the same user. This is often a very important task when merging multiple datasets. Sign up here: https://www.eventbrite.com/e/identity-resolution-essentials-from-a-data-scientist-tickets-654866582577

- Mage - a workflow orchestration tool, a nice alternative to Airflow and Prefect. We will see how to set up a simple pipeline with Mage. Sign up here: https://eventbrite.com/e/647017044397
Hi everyone!

The next iteration of the course is starting soon (in 1.5 months). In the meantime, you probably have a ton of questions

That's why we organize a Q&A stream on December 18 (Monday) at 17:00 CET where we will answer all your questions

Sign up here: https://lu.ma/1u1jlz4x

Also, on Monday (tomorrow) we will have a workshop that many of you will find relevant. We will talk about using Terraform for setting up a data warehouse (ClickHouse)

Sign up here: https://lu.ma/5fil21de
Many of you ask how much time you should devote to the course. The answer we have previously given was "it depends".

But actually we collected some data in the past editions of the course that can give a more accurate answer

Here's the dataset: https://github.com/DataTalksClub/zoomcamp-analytics/tree/main/data/de-zoomcamp-2023 (it also contains data from other courses)

You can do some analytics and then share the results with us

In this repo you can also find a notebook from Timur, our past student and teaching assistant, who did the analysis for the first edition of ML Zoomcamp. Half of his notebook is devoted to data cleaning, but actually DE Zoomcamp 2023 data is much cleaner, so most of it is not needed anymore

Have fun!
We're starting today at 17:00 CET! (In approximately 7 hours from now)

You can ask your questions in advance using this link:

https://app.sli.do/event/su9wCLiM9nHnCwtGBfgicX

See you soon!
The form for submitting homework 1: https://courses.datatalks.club/de-zoomcamp-2024/homework/hw01

(Please ignore the "This homework is already scored. You didn't submit your answers." part - it's a bug which we'll fix later)

This platform is under active development, but hopefully you won't have any problems.

If you come across a problem, you can report it in the #course-management-platform channel

If you're interested in how it works, the code is here: https://github.com/DataTalksClub/course-management-platform
Does anyone still have problems signing up to the course management platform to submit the homework?

Even if you haven't finished the homework yet, please try to sign up now, so you don't have any last moment problems.

If you log in with slack and it doesn't work, please try another authentication provider. If it also doesn't work, please write your email to Alexey Grigorev in DM

Here's the link: https://courses.datatalks.club
"What's homework URL?"

It's a very common question. I know you don't have time to check the FAQ, so citing Michael here:

It's your repository or any other location you have your code where a reasonable person would look at it and think yes, you went through the week and exercises.

By the way, here's the link to the FAQ:

https://docs.google.com/document/d/19bnYs80DwuUimHM65UV3sylsCn2j1vziPOwzBwQrebw/edit?usp=sharing