Andreas Kretz - Learn Data Engineering
994 subscribers
10 photos
1 video
67 links
Learn Data Engineering with Andreas Kretz
Download Telegram
NEW COURSE!!! Modern Data Warehouses no longer need you to load the data into them. Many warehouses like AWS Redshift, BigQuery or Snowflake allow you to load data directly from files in your Data Lake. This Data Lake integration is the key to flexibility of how you interact with your data. It makes a modern Data Warehouse so nice to use for all kinds of analytics workloads.

In this course you will learn how easy it is to use Data Lakes, Warehouses and BI tools. Load your files into the lake and visualize it in a report. 🚀

Course Contents
Where Data Warehouses fit into a platform
Data Warehouses ETL vs ELT data integration
Direct Access of Data Lake?
Data Warehouses and Data Lakes on AWS & GCP
GCP hands on Example with Cloud Storage, BigQuery, & Data Studio
AWS hands on example with S3, Glue, Athena, and Quicksight
Example with AWS Redshift


https://learndataengineering.com/p/modern-data-warehouses
Moving everything to one cloud is most of the time not possible. Or should we just keep the data where it is and try to connect where possible? Could be a practical approach. Always use the right tool for the job, or is this too complicated?

Link to the video on YouTube: https://youtu.be/bsSUa1CrWqo
Cloud billing can be very difficult to forecast. There are so many variables in play. Are you looking for a solution? In this post @Zach Quinn is showing you a way how to use BigQuery's meta data to calculate and forecast the costs. Really cool idea! Makes me wonder what other services offer this kind of meta data, too.

Here is a quick view in the article:
Data engineers can leverage SQL statements to fetch database metadata in order to calculate costs incurred with PaaS products like BigQuery.
In the article you’ll learn:
-How to access table metadata in BigQuery
-How to use standard SQL to convert bytes to GB and TB
-How to calculate per gigabyte rates

Read "How Data Engineers Can Use SQL to Estimate BigQuery Storage Costs" in our publication "Plumbers of Data Science" on Medium.

https://medium.com/plumbersofdatascience/how-data-engineers-can-use-sql-to-estimate-bigquery-storage-costs-cbcdfca18899
I love the smell of some nice SQL in the morning! I keep telling you for structured data use cases SQL was and still is the gold standard. Here's another great article from Zach creating tables, doing nested queries and more.

Preview on the content:
- How to effectively use subqueries to structure complex SQL queries
- How to use dynamic date filters to avoid hard coding date ranges 
- How to use conditional logic to return a decision

Read "Data Engineering IRL: How to Use SQL to Track Your Spending" in our publication "Plumbers of Data Science" on Medium.

https://medium.com/plumbersofdatascience/date-engineering-irl-how-to-use-sql-to-track-your-spending-79f47512af2b
Check out Benjamin's post about 2022 predictions for Data Engineering. I also sent him my 2 cents.
Make sure that you follow him on Linkedin and on YouTube. He does really great videos (I'm a bit jealous).
Ben also just quit his job at Facebook to start his own company.
BIG CONGRATS!!!!

https://seattledataguy.substack.com/p/5-big-data-experts-predictions-for
Andreas Kretz - Learn Data Engineering pinned «Check out Benjamin's post about 2022 predictions for Data Engineering. I also sent him my 2 cents. Make sure that you follow him on Linkedin and on YouTube. He does really great videos (I'm a bit jealous). Ben also just quit his job at Facebook to start…»
I just released a NEW Course Python for Data Engineers!

You come from a different field and haven't coded before? No problem! This course is picking up where our Python 1 course from @Amit Jain has ended.

You learn all the important basics a Data Engineer needs. From advanced Python features, how to transform data with pandas to Working with APIs and Postgres databases.

Kristijan Bakaric and I created hands on examples for every lesson. In 2.5 hours of videos we go through each of them together. We also prepared the source codes in our GitHub.
🚀

Course Content:
Exception handling
Understand what classes and objects are
how to use modules
Log out messages into files
how to work with dates and JSONs
Understand unit tests and data validation
Pandas to transform your data
Numpy to apply mathematical functions
Working with Postgres

https://learndataengineering.com/p/python-for-data-engineers
The new Machine Learning & Containerization on AWS project is online!! 🚀
As always, active members oft the Data Engineering Academy already have access to the course.

What’s the course about:
In this example project you learn how to create a data pipeline where you pull data from the Twitter API, analyze, store and visualize it.
You will host your Machine Learning algorithm on AWS using Lambda and setup your own postgres database with RDS. You create a Streamlit dashboard and gain experience hosting it using Elastic Container Registry (ECR) and Elastic Container Service (ECS). 

This project also gives you insights on how to handle dependency management with Poetry.
Have fun!

Course Content:
Setup and configure Twitter API
Launching RDS Postgres DB
Create S3 bucket for raw storage
Create ML Lambda that extracts & analyses Tweets
Schedule Lambda with Event Bridge
Create Streamlit visualization app
Dependency management with Poetry & create Docker image
Install & configure AWS CLI
Setup Elastic Container Registry ECR
Create Elastic Container Service ECS Fargate cluster
Run our Streamlit app as ECS task

Learn how to build a NLP pipeline and host containers in the cloud

Link to the course in the Data Engineering Academy - Trusted by over 500 students: https://learndataengineering.com/p/ml-on-aws
Join the Data Engineering Discord server! https://discord.gg/Wxy2mQA7Fy
Andreas Kretz - Learn Data Engineering pinned «Join the Data Engineering Discord server! https://discord.gg/Wxy2mQA7Fy»
Andreas Kretz - Learn Data Engineering pinned «Chat with Andreas how the Academy and the Full-Stack Coaching can help you: https://t.me/+V87Cha4h1_VGR9ac»
This media is not supported in your browser
VIEW IN TELEGRAM
Channel name was changed to «Andreas Kretz - Learn Data Engineering»
New Podcast episode!

In today's episode, I’m talking with Tom Schamberger from msg. He leads their cloud data platform team and brings a ton of experience from consulting, startups, and platform design.
We talked about:
The skills data engineers actually need in consulting
Why soft skills are underrated
Tool debates (Databricks, Snowflake, SAP…)
What consulting projects really look like
and more!

Listen to the episode here: https://creators.spotify.com/pod/show/andreaskayy/episodes/125-These-Skills-Get-You-a-Data-Consulting-Job--with-Tom-Schamberger-e35at82
Or watch on YouTube: https://youtu.be/jWaVtLNYNIw