Data Lake Architecture: How To Level Up Your Business To The Data-Driven World
https://towardsdatascience.com/data-lake-architecture-for-a-heavy-machinery-dealer-how-to-level-up-your-business-to-the-b41145e86b15
https://towardsdatascience.com/data-lake-architecture-for-a-heavy-machinery-dealer-how-to-level-up-your-business-to-the-b41145e86b15
Medium
Data Lake Architecture: How To Level Up Your Business To The Data-Driven World
Better late than never
👍1
Интересный кейс монетизации парсинга данных
20 млн рублей в год на парсинге сайтов — Разработка на vc.ru
https://vc.ru/dev/496144-20-mln-rubley-v-god-na-parsinge-saytov
20 млн рублей в год на парсинге сайтов — Разработка на vc.ru
https://vc.ru/dev/496144-20-mln-rubley-v-god-na-parsinge-saytov
vc.ru
20 млн рублей в год на парсинге сайтов — Разработка на vc.ru
Меня зовут Максим Кульгин, моя компания xmldatafeed занимается парсингом сайтов в России порядка четырёх лет. Ежедневно мы парсим более 500 крупнейших интернет-магазинов в России. Теперь делимся опытом.
👍1
Data Engineering Wiki
It contains a constantly evolving collection of topics related to data engineering. Since we're at a very early stage, there's a lot of space to grow!
https://dataengineering.wiki/
It contains a constantly evolving collection of topics related to data engineering. Since we're at a very early stage, there's a lot of space to grow!
https://dataengineering.wiki/
🔥2👍1
Еще один open-source проект, который в первую очередь предназначен для команд, которые работают с dbt
██████╗░██████╗░████████╗
██╔══██╗██╔══██╗╚══██╔══╝
██║░░██║██████╦╝░░░██║░░░
██║░░██║██╔══██╗░░░██║░░░
██████╔╝██████╦╝░░░██║░░░
╚═════╝░╚═════╝░░░░╚═╝░░░
Open-source data observability for analytics engineers
💬 Data anomalies monitoring as dbt tests - Collect metrics and metadata over time, detect anomalies, as native dbt tests in your project!
💬 Data observability report - Generate a report for all dbt tests and share with your team.
💬 dbt artifacts uploader
💬 Slack alerts
💬 Data lineage made simple, reliable, and automated
👉 @devops_dataops
https://github.com/elementary-data/elementary
██████╗░██████╗░████████╗
██╔══██╗██╔══██╗╚══██╔══╝
██║░░██║██████╦╝░░░██║░░░
██║░░██║██╔══██╗░░░██║░░░
██████╔╝██████╦╝░░░██║░░░
╚═════╝░╚═════╝░░░░╚═╝░░░
Open-source data observability for analytics engineers
💬 Data anomalies monitoring as dbt tests - Collect metrics and metadata over time, detect anomalies, as native dbt tests in your project!
💬 Data observability report - Generate a report for all dbt tests and share with your team.
💬 dbt artifacts uploader
💬 Slack alerts
💬 Data lineage made simple, reliable, and automated
👉 @devops_dataops
https://github.com/elementary-data/elementary
GitHub
GitHub - elementary-data/elementary: The dbt-native data observability solution for data & analytics engineers. Monitor your data…
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features. - element...
👍1
Deep Dive on ClickHouse Sharding and Replication Webinar
Join the Altinity experts as we dig into ClickHouse sharding and replication, showing how they enable clusters that deliver fast queries over petabytes of data. We’ll start with basic definitions of each, then move to practical issues. This includes the setup of shards and replicas, defining schema, choosing sharding keys, loading data, and writing distributed queries. We’ll finish up with tips on performance optimization.
#ClickHouse
👉 @devops_dataops
https://www.youtube.com/watch?v=Vuh6NOluIxo
Join the Altinity experts as we dig into ClickHouse sharding and replication, showing how they enable clusters that deliver fast queries over petabytes of data. We’ll start with basic definitions of each, then move to practical issues. This includes the setup of shards and replicas, defining schema, choosing sharding keys, loading data, and writing distributed queries. We’ll finish up with tips on performance optimization.
#ClickHouse
👉 @devops_dataops
https://www.youtube.com/watch?v=Vuh6NOluIxo
YouTube
Deep Dive on ClickHouse® Sharding and Replication | Tutorial for ClickHouse®
Experience the unmatched power of ClickHouse® on Hetzner: https://altinity.com/altinity-cloud-on-hetzner/
_______________________________
Join us as we dig into ClickHouse sharding and replication, showing how they enable clusters that deliver fast queries…
_______________________________
Join us as we dig into ClickHouse sharding and replication, showing how they enable clusters that deliver fast queries…
👍1
GitHub - kestra-io/kestra
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
👉 @devops_dataops
https://github.com/kestra-io/kestra
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
👉 @devops_dataops
https://github.com/kestra-io/kestra
GitHub
GitHub - kestra-io/kestra: :zap: Universal Workflow Orchestration Platform — Code in any language, run anywhere. 800+ plugins for…
:zap: Universal Workflow Orchestration Platform — Code in any language, run anywhere. 800+ plugins for data, infrastructure, and AI automation. - kestra-io/kestra
👍1
Настраиваем iptables с нуля
Хорошее видео, чтобы вникнуть в концепцию firewall на Linux
https://www.youtube.com/watch?v=Q0EC8kJlB64
Хорошее видео, чтобы вникнуть в концепцию firewall на Linux
https://www.youtube.com/watch?v=Q0EC8kJlB64
YouTube
Настраиваем iptables с нуля
Защита сетевых соединений сейчас просто необходима буквально на каждом устройстве, подключенном к Интернет. Тем более, на сервере. Расскажем и покажем конфигурирование штатного сетевого экрана Linux через интерфейс Iptables.
Этот канал посвящён теме поддержки…
Этот канал посвящён теме поддержки…
👍4
Apache Airflow 2.4 — Everything You Need to Know
Data-driven scheduling
👉 @devops_dataops
https://medium.com/@astronomer.io/apache-airflow-2-4-everything-you-need-to-know-4eaccde15936
👉 @devops_dataops
https://medium.com/@astronomer.io/apache-airflow-2-4-everything-you-need-to-know-4eaccde15936
👍2
Конференция DevOops 2022 - доклады/видео
▫️ https://devoops.ru/schedule/days/
▫️ https://www.youtube.com/c/DevOops_conf/videos
https://github.com/evgeniy-kharchenko/DevOps-Roadmap - репозиторий одного из докладчика DevOps Roadmap
▫️ https://devoops.ru/schedule/days/
▫️ https://www.youtube.com/c/DevOops_conf/videos
https://github.com/evgeniy-kharchenko/DevOps-Roadmap - репозиторий одного из докладчика DevOps Roadmap
DevOops 2025. Конференция по инженерным решениям и DevOps-культуре
DevOops 2025 | Доклады | Конференция, посвященная практикам DevOps
Список докладов конференции DevOops 2025.
👍1
Google vs IBM Data Engineer Certificate - BEST Certificate for Data Engineers
0:00 Intro
1:24 Overview Google
2:42 Overview IBM
3:56 Outcomes
6:16 Similarities
7:53 Differences
11:47 My Personal Recommendation
https://youtu.be/UpMBdfg2ZrE
0:00 Intro
1:24 Overview Google
2:42 Overview IBM
3:56 Outcomes
6:16 Similarities
7:53 Differences
11:47 My Personal Recommendation
https://youtu.be/UpMBdfg2ZrE
YouTube
Google vs IBM Data Engineer Certificate - BEST Certificate for Data Engineers
IBM vs Google's Data Engineer Certificates.
Many of you have asked for this for the past 6-7 Months and now I have finally got to reviewing both of them!
So which one of these data engineering certificates is best for you?
Check out the GCP Data Engineering…
Many of you have asked for this for the past 6-7 Months and now I have finally got to reviewing both of them!
So which one of these data engineering certificates is best for you?
Check out the GCP Data Engineering…
Data Engineering
A collection of one-off topics or videos that do not fall neatly into any other existing playlist.
1. A Brief History of Data Engineering | What is Data Engineering?
2. How to Become a Data Engineer (with no experience)
3. ETL vs ELT | Modern Data Architectures
4. YAML Tutorial | Learn YAML in 10 Minutes
5. What is Data Streaming?
6. 3 Must-Know Trends for Data Engineers | DataOps
7. What skills do you need as a Data Engineer?
8. What is Reverse ETL?
9. What tools should you know as a Data Engineer?
10. Intro to BASH // Command Line for Beginners
11. Getting Started w/ Airbyte! | Open Source Data Integration
12. Data Warehouse vs Data Lake | Explained (non-technical)
13. Data Modeling in the Modern Data Stack
14. Getting Started w/ Metabase | Open Source Data Visualization Tool
15. What do you actually do as a data engineer?
👉 @devops_dataops
https://www.youtube.com/playlist?list=PLy4OcwImJzBKg3rmROyI_CBBAYlQISkOO
A collection of one-off topics or videos that do not fall neatly into any other existing playlist.
1. A Brief History of Data Engineering | What is Data Engineering?
2. How to Become a Data Engineer (with no experience)
3. ETL vs ELT | Modern Data Architectures
4. YAML Tutorial | Learn YAML in 10 Minutes
5. What is Data Streaming?
6. 3 Must-Know Trends for Data Engineers | DataOps
7. What skills do you need as a Data Engineer?
8. What is Reverse ETL?
9. What tools should you know as a Data Engineer?
10. Intro to BASH // Command Line for Beginners
11. Getting Started w/ Airbyte! | Open Source Data Integration
12. Data Warehouse vs Data Lake | Explained (non-technical)
13. Data Modeling in the Modern Data Stack
14. Getting Started w/ Metabase | Open Source Data Visualization Tool
15. What do you actually do as a data engineer?
👉 @devops_dataops
https://www.youtube.com/playlist?list=PLy4OcwImJzBKg3rmROyI_CBBAYlQISkOO
👍2
How to Make Data Documentation Sexy
The job that nobody wants to do gets a rebrand.
https://medium.com/geekculture/how-to-make-data-documentation-sexy-c0ef0d696f78
The job that nobody wants to do gets a rebrand.
https://medium.com/geekculture/how-to-make-data-documentation-sexy-c0ef0d696f78
Medium
How to Make Data Documentation Sexy.
The job that nobody wants to do gets a rebrand.
👍1
Forwarded from karpov.courses
У нас хорошие новости: мы сделали бесплатный курс по Docker.
Docker применяется в Data Science, разработке, инженерии данных и даже тестировании! Уверены, программа будет полезна всем, кто пишет код и работает с приложениями.
Вы научитесь:
● заворачивать собственные приложения в контейнеры;
● локально разворачивать готовые сервисы: Airflow, Postgres, ClickHouse, Nginx;
● поднимать и настраивать полноценные веб-приложения.
Программа даст вам базовые знания, с которыми можно будет сделать шаг навстречу ещё более интересным инструментам — например, Kubernetes.
Автор курса – Антон Сидорин, бэкенд-разработчик karpov.соurses.
Начать учиться можно в любое удобное время.
[Познакомиться с Docker]
Docker применяется в Data Science, разработке, инженерии данных и даже тестировании! Уверены, программа будет полезна всем, кто пишет код и работает с приложениями.
Вы научитесь:
● заворачивать собственные приложения в контейнеры;
● локально разворачивать готовые сервисы: Airflow, Postgres, ClickHouse, Nginx;
● поднимать и настраивать полноценные веб-приложения.
Программа даст вам базовые знания, с которыми можно будет сделать шаг навстречу ещё более интересным инструментам — например, Kubernetes.
Автор курса – Антон Сидорин, бэкенд-разработчик karpov.соurses.
Начать учиться можно в любое удобное время.
[Познакомиться с Docker]
👍3
Article collection about data analytics workflows and building pipelines:
🔸Capturing Data Analytics Workflows and System Requirements
🔸DevOps for DataOps: Building a CI/CD Pipeline for Apache Airflow DAGs
🔸Intro to data science on Google Cloud
🔸Building a Spark and Airflow development environment with Docker
🔸Apache Airflow for Beginners - Build Your First Data Pipeline
📍 @devops_dataops
🔸Capturing Data Analytics Workflows and System Requirements
🔸DevOps for DataOps: Building a CI/CD Pipeline for Apache Airflow DAGs
🔸Intro to data science on Google Cloud
🔸Building a Spark and Airflow development environment with Docker
🔸Apache Airflow for Beginners - Build Your First Data Pipeline
📍 @devops_dataops
👍2
(DataCamp) Introduction to Airflow in Python
This is a memo to share what I have learnt in Apache Airflow, capturing the learning objectives as well as my personal notes. The course is taught by Mike Metzger from DataCamp, and it includes 4 chapters:
▫️ Intro to Airflow
▫️ Implementing Airflow DAGs
▫️ Maintaining and monitoring Airflow workflows
▫️ Building production pipelines in Airflow
https://github.com/JNYH/DataCamp_Introduction_to_Airflow
Personal Notes:
https://medium.com/swlh/introduction-to-airflow-in-python-67b554f06f0b
This is a memo to share what I have learnt in Apache Airflow, capturing the learning objectives as well as my personal notes. The course is taught by Mike Metzger from DataCamp, and it includes 4 chapters:
▫️ Intro to Airflow
▫️ Implementing Airflow DAGs
▫️ Maintaining and monitoring Airflow workflows
▫️ Building production pipelines in Airflow
https://github.com/JNYH/DataCamp_Introduction_to_Airflow
Personal Notes:
https://medium.com/swlh/introduction-to-airflow-in-python-67b554f06f0b
GitHub
GitHub - JNYH/DataCamp_Introduction_to_Airflow: This is a memo to share what I have learnt in Apache Airflow
This is a memo to share what I have learnt in Apache Airflow - GitHub - JNYH/DataCamp_Introduction_to_Airflow: This is a memo to share what I have learnt in Apache Airflow
Forwarded from Airbyte - ETL ELT Data Pipelines