Apache Airflow 2.4 — Everything You Need to Know
Data-driven scheduling
👉 @devops_dataops
https://medium.com/@astronomer.io/apache-airflow-2-4-everything-you-need-to-know-4eaccde15936
👉 @devops_dataops
https://medium.com/@astronomer.io/apache-airflow-2-4-everything-you-need-to-know-4eaccde15936
👍2
Конференция DevOops 2022 - доклады/видео
▫️ https://devoops.ru/schedule/days/
▫️ https://www.youtube.com/c/DevOops_conf/videos
https://github.com/evgeniy-kharchenko/DevOps-Roadmap - репозиторий одного из докладчика DevOps Roadmap
▫️ https://devoops.ru/schedule/days/
▫️ https://www.youtube.com/c/DevOops_conf/videos
https://github.com/evgeniy-kharchenko/DevOps-Roadmap - репозиторий одного из докладчика DevOps Roadmap
DevOops 2025. Конференция по инженерным решениям и DevOps-культуре
DevOops 2025 | Доклады | Конференция, посвященная практикам DevOps
Список докладов конференции DevOops 2025.
👍1
Google vs IBM Data Engineer Certificate - BEST Certificate for Data Engineers
0:00 Intro
1:24 Overview Google
2:42 Overview IBM
3:56 Outcomes
6:16 Similarities
7:53 Differences
11:47 My Personal Recommendation
https://youtu.be/UpMBdfg2ZrE
0:00 Intro
1:24 Overview Google
2:42 Overview IBM
3:56 Outcomes
6:16 Similarities
7:53 Differences
11:47 My Personal Recommendation
https://youtu.be/UpMBdfg2ZrE
YouTube
Google vs IBM Data Engineer Certificate - BEST Certificate for Data Engineers
IBM vs Google's Data Engineer Certificates.
Many of you have asked for this for the past 6-7 Months and now I have finally got to reviewing both of them!
So which one of these data engineering certificates is best for you?
Check out the GCP Data Engineering…
Many of you have asked for this for the past 6-7 Months and now I have finally got to reviewing both of them!
So which one of these data engineering certificates is best for you?
Check out the GCP Data Engineering…
Data Engineering
A collection of one-off topics or videos that do not fall neatly into any other existing playlist.
1. A Brief History of Data Engineering | What is Data Engineering?
2. How to Become a Data Engineer (with no experience)
3. ETL vs ELT | Modern Data Architectures
4. YAML Tutorial | Learn YAML in 10 Minutes
5. What is Data Streaming?
6. 3 Must-Know Trends for Data Engineers | DataOps
7. What skills do you need as a Data Engineer?
8. What is Reverse ETL?
9. What tools should you know as a Data Engineer?
10. Intro to BASH // Command Line for Beginners
11. Getting Started w/ Airbyte! | Open Source Data Integration
12. Data Warehouse vs Data Lake | Explained (non-technical)
13. Data Modeling in the Modern Data Stack
14. Getting Started w/ Metabase | Open Source Data Visualization Tool
15. What do you actually do as a data engineer?
👉 @devops_dataops
https://www.youtube.com/playlist?list=PLy4OcwImJzBKg3rmROyI_CBBAYlQISkOO
A collection of one-off topics or videos that do not fall neatly into any other existing playlist.
1. A Brief History of Data Engineering | What is Data Engineering?
2. How to Become a Data Engineer (with no experience)
3. ETL vs ELT | Modern Data Architectures
4. YAML Tutorial | Learn YAML in 10 Minutes
5. What is Data Streaming?
6. 3 Must-Know Trends for Data Engineers | DataOps
7. What skills do you need as a Data Engineer?
8. What is Reverse ETL?
9. What tools should you know as a Data Engineer?
10. Intro to BASH // Command Line for Beginners
11. Getting Started w/ Airbyte! | Open Source Data Integration
12. Data Warehouse vs Data Lake | Explained (non-technical)
13. Data Modeling in the Modern Data Stack
14. Getting Started w/ Metabase | Open Source Data Visualization Tool
15. What do you actually do as a data engineer?
👉 @devops_dataops
https://www.youtube.com/playlist?list=PLy4OcwImJzBKg3rmROyI_CBBAYlQISkOO
👍2
How to Make Data Documentation Sexy
The job that nobody wants to do gets a rebrand.
https://medium.com/geekculture/how-to-make-data-documentation-sexy-c0ef0d696f78
The job that nobody wants to do gets a rebrand.
https://medium.com/geekculture/how-to-make-data-documentation-sexy-c0ef0d696f78
Medium
How to Make Data Documentation Sexy.
The job that nobody wants to do gets a rebrand.
👍1
Forwarded from karpov.courses
У нас хорошие новости: мы сделали бесплатный курс по Docker.
Docker применяется в Data Science, разработке, инженерии данных и даже тестировании! Уверены, программа будет полезна всем, кто пишет код и работает с приложениями.
Вы научитесь:
● заворачивать собственные приложения в контейнеры;
● локально разворачивать готовые сервисы: Airflow, Postgres, ClickHouse, Nginx;
● поднимать и настраивать полноценные веб-приложения.
Программа даст вам базовые знания, с которыми можно будет сделать шаг навстречу ещё более интересным инструментам — например, Kubernetes.
Автор курса – Антон Сидорин, бэкенд-разработчик karpov.соurses.
Начать учиться можно в любое удобное время.
[Познакомиться с Docker]
Docker применяется в Data Science, разработке, инженерии данных и даже тестировании! Уверены, программа будет полезна всем, кто пишет код и работает с приложениями.
Вы научитесь:
● заворачивать собственные приложения в контейнеры;
● локально разворачивать готовые сервисы: Airflow, Postgres, ClickHouse, Nginx;
● поднимать и настраивать полноценные веб-приложения.
Программа даст вам базовые знания, с которыми можно будет сделать шаг навстречу ещё более интересным инструментам — например, Kubernetes.
Автор курса – Антон Сидорин, бэкенд-разработчик karpov.соurses.
Начать учиться можно в любое удобное время.
[Познакомиться с Docker]
👍3
Article collection about data analytics workflows and building pipelines:
🔸Capturing Data Analytics Workflows and System Requirements
🔸DevOps for DataOps: Building a CI/CD Pipeline for Apache Airflow DAGs
🔸Intro to data science on Google Cloud
🔸Building a Spark and Airflow development environment with Docker
🔸Apache Airflow for Beginners - Build Your First Data Pipeline
📍 @devops_dataops
🔸Capturing Data Analytics Workflows and System Requirements
🔸DevOps for DataOps: Building a CI/CD Pipeline for Apache Airflow DAGs
🔸Intro to data science on Google Cloud
🔸Building a Spark and Airflow development environment with Docker
🔸Apache Airflow for Beginners - Build Your First Data Pipeline
📍 @devops_dataops
👍2
(DataCamp) Introduction to Airflow in Python
This is a memo to share what I have learnt in Apache Airflow, capturing the learning objectives as well as my personal notes. The course is taught by Mike Metzger from DataCamp, and it includes 4 chapters:
▫️ Intro to Airflow
▫️ Implementing Airflow DAGs
▫️ Maintaining and monitoring Airflow workflows
▫️ Building production pipelines in Airflow
https://github.com/JNYH/DataCamp_Introduction_to_Airflow
Personal Notes:
https://medium.com/swlh/introduction-to-airflow-in-python-67b554f06f0b
This is a memo to share what I have learnt in Apache Airflow, capturing the learning objectives as well as my personal notes. The course is taught by Mike Metzger from DataCamp, and it includes 4 chapters:
▫️ Intro to Airflow
▫️ Implementing Airflow DAGs
▫️ Maintaining and monitoring Airflow workflows
▫️ Building production pipelines in Airflow
https://github.com/JNYH/DataCamp_Introduction_to_Airflow
Personal Notes:
https://medium.com/swlh/introduction-to-airflow-in-python-67b554f06f0b
GitHub
GitHub - JNYH/DataCamp_Introduction_to_Airflow: This is a memo to share what I have learnt in Apache Airflow
This is a memo to share what I have learnt in Apache Airflow - GitHub - JNYH/DataCamp_Introduction_to_Airflow: This is a memo to share what I have learnt in Apache Airflow
Forwarded from Airbyte - ETL ELT Data Pipelines
Airflow Tutorial for Beginners - Full Course in 2 Hours 2022
Throughout the course, you will learn:
00:00 - Airflow Introduction
03:06 - Run Airflow in Python Env
10:44 - Run Airflow in Docker
17:55 - Airflow Basics and Core Concepts
21:55 - Airflow Task Lifecycle
26:19 - Airflow Basic Architecture
28:14 - Airflow DAG with Bash Operator
40:09 - Airflow DAG with Python Operator
45:04 - Data Sharing via Airflow XComs
52:53 - Airflow Task Flow API
57:56 - Airflow Catch-Up and Backfill
01:02:09 - Airflow Scheduler with Cron Expression
01:07:25 - Airflow Connection to Postgres
01:08:58 - Airflow Postgres Operator
01:19:30 - Airflow Docker Install Python Package 2 ways
01:29:34 - Airflow AWS S3 Sensor Operator
01:42:37 - Airflow Hooks S3 PostgreSQL
02:00:43 - Course Bonus
https://www.youtube.com/watch?v=K9AnJ9_ZAXE
Throughout the course, you will learn:
00:00 - Airflow Introduction
03:06 - Run Airflow in Python Env
10:44 - Run Airflow in Docker
17:55 - Airflow Basics and Core Concepts
21:55 - Airflow Task Lifecycle
26:19 - Airflow Basic Architecture
28:14 - Airflow DAG with Bash Operator
40:09 - Airflow DAG with Python Operator
45:04 - Data Sharing via Airflow XComs
52:53 - Airflow Task Flow API
57:56 - Airflow Catch-Up and Backfill
01:02:09 - Airflow Scheduler with Cron Expression
01:07:25 - Airflow Connection to Postgres
01:08:58 - Airflow Postgres Operator
01:19:30 - Airflow Docker Install Python Package 2 ways
01:29:34 - Airflow AWS S3 Sensor Operator
01:42:37 - Airflow Hooks S3 PostgreSQL
02:00:43 - Course Bonus
https://www.youtube.com/watch?v=K9AnJ9_ZAXE
YouTube
Airflow Tutorial for Beginners - Full Course in 2 Hours 2022
Airflow Tutorial for Beginners - Full Course in 2 Hours 2022
#Airflow #AirflowTutorial #Coder2j
========== VIDEO CONTENT 📚 ==========
In this 2-hour Airflow Tutorial for Beginners Full Course, we combine theory explanation and practical demos to help you…
#Airflow #AirflowTutorial #Coder2j
========== VIDEO CONTENT 📚 ==========
In this 2-hour Airflow Tutorial for Beginners Full Course, we combine theory explanation and practical demos to help you…
👍6
Apache Spark / PySpark Tutorial: Basics In 15 Mins
This video gives an introduction to the Spark ecosystem and world of Big Data, using the Python Programming Language and its PySpark API. We also discuss the idea of parallel and distributed computing, and computing on a cluster of machines.
https://youtu.be/QLQsW8VbTN4
This video gives an introduction to the Spark ecosystem and world of Big Data, using the Python Programming Language and its PySpark API. We also discuss the idea of parallel and distributed computing, and computing on a cluster of machines.
https://youtu.be/QLQsW8VbTN4
YouTube
Apache Spark / PySpark Tutorial: Basics In 15 Mins
Thank you for watching the video! Here is the notebook: https://github.com/gahogg/YouTube-I-mostly-use-colab-now-/blob/master/PySpark%20In%2015%20Minutes.ipynb
I offer 1 on 1 tutoring for Data Structures & Algos, and Analytics / ML! Book a free consultation…
I offer 1 on 1 tutoring for Data Structures & Algos, and Analytics / ML! Book a free consultation…
👍2
_Mastering Ubuntu Server - Third Edition.pdf
15.2 MB
Mastering Ubuntu Server - Third Edition
https://github.com/PacktPublishing/Mastering-Ubuntu-Server_Third-Edition
https://github.com/PacktPublishing/Mastering-Ubuntu-Server_Third-Edition
Yacht vs. Portainer - Docker dashboard comparison - Virtualization Howto
https://www.virtualizationhowto.com/2022/12/yacht-vs-portainer-docker-dashboard-comparison/
https://www.virtualizationhowto.com/2022/12/yacht-vs-portainer-docker-dashboard-comparison/
Virtualization Howto
Yacht vs. Portainer - Docker dashboard comparison
Yacht vs. Portainer - Docker dashboard comparison. A look at Yacht, the Portainer alternative and which one you should use.
Source: https://www.linkedin.com/posts/timo-dechau_in-our-little-data-world-are-we-naming-things-activity-6925303646817529856-Nu-U/
---
In our little data world are we naming things too much based on our marketing perspective. And is there serious over-selling going on.
Maybe yes.
Let’s do some examples:
dbt is not a data model tool. I see this notion quite often. It’s first a SQL orchestration and testing tool. Of course, I can use it to build and manage a data model. But this requires me to do the thinking not dbt
Snowflake and BigQuery are not data warehouses. Great people like .Rogier Werschkull. and Chad Sanderson remind us about that. They are analytical databases in the cloud. Of course, you can build a data warehouse with them. But this requires you to come up with a concept and architecture.
Fivetran and Airbyte are not ELT tools - they extract and load for you. And you are in charge of the transformation. They are basically supermarkets with self-checkout. Great idea but you have to do more.
Segment and Rudderstack are not really CDPs - Arpit Choudhury has written a great piece about it - they are customer data infrastructure, the collection and identity stitching layer
Reverse ETL is just ETL
Why is this important?
Because often these labels create expectations about the solution that these tools can’t fulfill.
When I set up Snowflake and think that I have a data warehouse now - I create huge expectations in my organization that I can’t fulfill.
Same with dbt - Ok, we need a data model, let’s use dbt for this. And then you add one sql file to the next one and call it a model.
Tools are tools, just that.
---
In our little data world are we naming things too much based on our marketing perspective. And is there serious over-selling going on.
Maybe yes.
Let’s do some examples:
dbt is not a data model tool. I see this notion quite often. It’s first a SQL orchestration and testing tool. Of course, I can use it to build and manage a data model. But this requires me to do the thinking not dbt
Snowflake and BigQuery are not data warehouses. Great people like .Rogier Werschkull. and Chad Sanderson remind us about that. They are analytical databases in the cloud. Of course, you can build a data warehouse with them. But this requires you to come up with a concept and architecture.
Fivetran and Airbyte are not ELT tools - they extract and load for you. And you are in charge of the transformation. They are basically supermarkets with self-checkout. Great idea but you have to do more.
Segment and Rudderstack are not really CDPs - Arpit Choudhury has written a great piece about it - they are customer data infrastructure, the collection and identity stitching layer
Reverse ETL is just ETL
Why is this important?
Because often these labels create expectations about the solution that these tools can’t fulfill.
When I set up Snowflake and think that I have a data warehouse now - I create huge expectations in my organization that I can’t fulfill.
Same with dbt - Ok, we need a data model, let’s use dbt for this. And then you add one sql file to the next one and call it a model.
Tools are tools, just that.
Linkedin
In our little data world are we naming things too much based on our… | Timo Dechau | 48 comments
In our little data world are we naming things too much based on our marketing perspective. And is there serious over-selling going on.
Maybe yes.
Let’s do some examples:
dbt is not a data model tool. I see this notion quite often. It’s first a SQL orchestration…
Maybe yes.
Let’s do some examples:
dbt is not a data model tool. I see this notion quite often. It’s first a SQL orchestration…
From DataCamp
1.Introduction to Airflow
2.Airflow DAGs
3.Airflow web interface
1.Introduction to Airflow
2.Airflow DAGs
3.Airflow web interface
Kubernetes Cheat Sheet - Page 1 & Page 2