Data Engineering / Инженерия данных / Data Engineer / DWH
1.95K subscribers
49 photos
7 videos
52 files
350 links
Data Engineering: ETL / DWH / Data Pipelines based on Open-Source software. Инженерия данных.

DWH / SQL
Python / ETL / ELT / dbt / Spark
Apache Airflow

Рекламу не размещаю
Вопросы: @iv_shamaev | datatalks.ru
Download Telegram
Prescriber-ETL-data-pipeline

An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports

https://github.com/judeleonard/Prescriber-ETL-data-pipeline
👍1
This media is not supported in your browser
VIEW IN TELEGRAM
OpenMetadata vs DataHub

Один из пунктов "Против" решения Datahub - это их раздражающий функционал открытия Data Lineage.
Почему нельзя сделать кнопку открытия всего дерева - для меня загадка.
Пока при сравнении OpenMetadata vs DataHub лидирует OpenMetadata продукт.
👍1
Data Engineering with Python.pdf
10.5 MB
Data Engineering with Python
Packt Publishing

Key Features
▫️Become well-versed in data architectures, data preparation, and data optimization skills with the help of practical examples
▫️Design data models and learn how to extract, transform, and load (ETL) data using Python
▫️Schedule, automate, and monitor complex data pipelines in production

👉 @devops_dataops
🔥3
Data Engineering - Open Source Tools/Databases

A curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.

Airflow
Cassandra
ClickHouse
Drill
Druid
ELK
Grafana-Prometheus
Hadoop
Kafka
LakeFS
Mariadb
Minio
Postgres
Redis
Spark
Superset
Trino
mongo


https://github.com/irbigdata/data-dockerfiles
mad2023.pdf
26.8 MB
The 2023 MAD (Machine Learning, Artificial Intelligence & Data) Landscape – Matt Turck

Source: https://mattturck.com/mad2023/
Подборка проектов с GitHub

〰️〰️〰️〰️〰️〰️〰️〰️
🔸 Engineering Python
Welcome to Engineering Python. This is a Python programming course for engineers.

This GitHub repository hosts the Jupyter Notebooks and Python source code for the open course on YouTube (
http://youtube.com/yongtwang).

A tutorial on how to use these course materials is in this YouTube video:
02C Course Materials and Jupyter Notebook.

〰️〰️〰️〰️〰️〰️〰️〰️
🔸 Fun and useful projects with Python
You can find the corresponding tutorials on my channel: https://www.youtube.com/c/PythonEngineer

〰️〰️〰️〰️〰️〰️〰️〰️
🔸 Python Engineer Roadmap
Python can be used in a lot of computer science fields. In this repository, we have collected resources for each field of computer science that are related to Python.

〰️〰️〰️〰️〰️〰️〰️〰️
🔸 PyTorch Beginner Tutorials from my YouTube channel

• Installation
• Tensor Basics
• Autograd
• Backpropagation
• Gradient Descent With Autograd and Backpropagation
• Training Pipeline: Model, Loss, and Optimizer
• Linear Regression
• Logistic Regression
• Dataset and DataLoader
• Dataset Transforms
• Softmax And Cross Entropy
• Activation Functions
• Feed-Forward Neural Net
• Convolutional Neural Net (CNN)
• Transfer Learning
• Tensorboard
• Save and Load Models
1