Data Build Tool (dbt). Transformation in Modern data stack | by Amit Singh Rathore | Jan, 2023 | Dev Genius
https://blog.devgenius.io/data-build-tool-dbt-1f0b03d97cc6
https://blog.devgenius.io/data-build-tool-dbt-1f0b03d97cc6
Medium
Data Build Tool (dbt)
Transformation in Modern data stack
Как запушить в Gitlab пакет npm, помогло 👍
Publishing your private npm packages to Gitlab NPM Registry
https://shivamarora.medium.com/publishing-your-private-npm-packages-to-gitlab-npm-registry-39d30a791085
Publishing your private npm packages to Gitlab NPM Registry
https://shivamarora.medium.com/publishing-your-private-npm-packages-to-gitlab-npm-registry-39d30a791085
Medium
Publishing your private npm packages to Gitlab NPM Registry
Configure npm, yarn, lerna to publish packages to Gitlab Package Registry and use them as dependencies in your project
Очередная подборочка инструментов Awesome-Selfhosted
A list of Free Software network services and web applications which can be hosted on your own servers
https://github.com/awesome-selfhosted/awesome-selfhosted
A list of Free Software network services and web applications which can be hosted on your own servers
https://github.com/awesome-selfhosted/awesome-selfhosted
GitHub
GitHub - awesome-selfhosted/awesome-selfhosted: A list of Free Software network services and web applications which can be hosted…
A list of Free Software network services and web applications which can be hosted on your own servers - awesome-selfhosted/awesome-selfhosted
👍1
Prescriber-ETL-data-pipeline
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports
https://github.com/judeleonard/Prescriber-ETL-data-pipeline
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports
https://github.com/judeleonard/Prescriber-ETL-data-pipeline
GitHub
GitHub - judeleonard/Prescriber-ETL-data-pipeline: An End-to-End ETL data pipeline that leverages pyspark parallel processing to…
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and ...
👍1
airflow-docker
This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows.
https://github.com/anilkulkarni87/airflow-docker
This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows.
https://github.com/anilkulkarni87/airflow-docker
GitHub
GitHub - anilkulkarni87/airflow-docker: This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose.…
This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows. - anilkulkarni87/airflow-docker
Неплохая сравнительная табличка по инструментам metadata management
Awesome Data Discovery and Observability
https://github.com/opendatadiscovery/awesome-data-catalogs
Awesome Data Discovery and Observability
https://github.com/opendatadiscovery/awesome-data-catalogs
GitHub
GitHub - opendatadiscovery/awesome-data-catalogs: 📙 Awesome Data Catalogs and Observability Platforms.
📙 Awesome Data Catalogs and Observability Platforms. - GitHub - opendatadiscovery/awesome-data-catalogs: 📙 Awesome Data Catalogs and Observability Platforms.
👍1
Примеры из курса про Apache Airflow 2.0
https://github.com/adilkhash/apache-airflow-course-materials
https://github.com/adilkhash/apache-airflow-course-materials
GitHub
GitHub - adilkhash/apache-airflow-course-materials: Курс про Apache Airflow 2.0
Курс про Apache Airflow 2.0. Contribute to adilkhash/apache-airflow-course-materials development by creating an account on GitHub.
❤1👍1🔥1
How to Orchestrate an ETL Data Pipeline with Apache Airflow
https://www.freecodecamp.org/news/orchestrate-an-etl-data-pipeline-with-apache-airflow/
https://www.freecodecamp.org/news/orchestrate-an-etl-data-pipeline-with-apache-airflow/
freeCodeCamp.org
How to Orchestrate an ETL Data Pipeline with Apache Airflow
By Aviator Ifeanyichukwu Data Orchestration involves using different tools and technologies together to extract, transform, and load (ETL) data from multiple sources into a central repository. Data orchestration typically involves a combination of t...
🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
OpenMetadata vs DataHub
Один из пунктов "Против" решения Datahub - это их раздражающий функционал открытия Data Lineage.
Почему нельзя сделать кнопку открытия всего дерева - для меня загадка.
Пока при сравнении OpenMetadata vs DataHub лидирует OpenMetadata продукт.
Один из пунктов "Против" решения Datahub - это их раздражающий функционал открытия Data Lineage.
Почему нельзя сделать кнопку открытия всего дерева - для меня загадка.
Пока при сравнении OpenMetadata vs DataHub лидирует OpenMetadata продукт.
👍1
Data Engineering with Python.pdf
10.5 MB
Data Engineering with Python
Packt Publishing
Key Features
▫️Become well-versed in data architectures, data preparation, and data optimization skills with the help of practical examples
▫️Design data models and learn how to extract, transform, and load (ETL) data using Python
▫️Schedule, automate, and monitor complex data pipelines in production
👉 @devops_dataops
Packt Publishing
Key Features
▫️Become well-versed in data architectures, data preparation, and data optimization skills with the help of practical examples
▫️Design data models and learn how to extract, transform, and load (ETL) data using Python
▫️Schedule, automate, and monitor complex data pipelines in production
👉 @devops_dataops
🔥3
Data Engineering - Open Source Tools/Databases
A curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.
Airflow
Cassandra
ClickHouse
Drill
Druid
ELK
Grafana-Prometheus
Hadoop
Kafka
LakeFS
Mariadb
Minio
Postgres
Redis
Spark
Superset
Trino
mongo
https://github.com/irbigdata/data-dockerfiles
A curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.
Airflow
Cassandra
ClickHouse
Drill
Druid
ELK
Grafana-Prometheus
Hadoop
Kafka
LakeFS
Mariadb
Minio
Postgres
Redis
Spark
Superset
Trino
mongo
https://github.com/irbigdata/data-dockerfiles
GitHub
GitHub - irbigdata/data-dockerfiles: a curated list of docker-compose files prepared for testing data engineering tools, databases…
a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries. - irbigdata/data-dockerfiles
Apache Druid in 5 minutes
https://youtu.be/X8ZnwwmCBAA
https://youtu.be/X8ZnwwmCBAA
YouTube
Apache Druid in 5 Minutes
Apache Druid is a real-time analytics database used by 1000s of companies like Netflix, Confluent, Salesforce, and Target. But what's the big deal? Why use Druid instead of a data warehouse - like Snowflake, BigQuery, or Redshift - or an operational database…
PySpark Tutorial
https://youtu.be/_C8kWso4ne4
GitHub code: https://github.com/krishnaik06/Pyspark-With-Python
https://youtu.be/_C8kWso4ne4
GitHub code: https://github.com/krishnaik06/Pyspark-With-Python
YouTube
PySpark Tutorial
Learn PySpark, an interface for Apache Spark in Python. PySpark is often used for large-scale data processing and machine learning.
💻 Code: https://github.com/krishnaik06/Pyspark-With-Python
✏️ Course from Krish Naik. Check out his channel: https://you…
💻 Code: https://github.com/krishnaik06/Pyspark-With-Python
✏️ Course from Krish Naik. Check out his channel: https://you…
mad2023.pdf
26.8 MB
The 2023 MAD (Machine Learning, Artificial Intelligence & Data) Landscape – Matt Turck
Source: https://mattturck.com/mad2023/
Source: https://mattturck.com/mad2023/
Подборка проектов с GitHub
〰️〰️〰️〰️〰️〰️〰️〰️
🔸 Engineering Python
Welcome to Engineering Python. This is a Python programming course for engineers.
This GitHub repository hosts the Jupyter Notebooks and Python source code for the open course on YouTube (http://youtube.com/yongtwang).
A tutorial on how to use these course materials is in this YouTube video: 02C Course Materials and Jupyter Notebook.
〰️〰️〰️〰️〰️〰️〰️〰️
🔸 Fun and useful projects with Python
You can find the corresponding tutorials on my channel: https://www.youtube.com/c/PythonEngineer
〰️〰️〰️〰️〰️〰️〰️〰️
🔸 Python Engineer Roadmap
Python can be used in a lot of computer science fields. In this repository, we have collected resources for each field of computer science that are related to Python.
〰️〰️〰️〰️〰️〰️〰️〰️
🔸 PyTorch Beginner Tutorials from my YouTube channel
• Installation
• Tensor Basics
• Autograd
• Backpropagation
• Gradient Descent With Autograd and Backpropagation
• Training Pipeline: Model, Loss, and Optimizer
• Linear Regression
• Logistic Regression
• Dataset and DataLoader
• Dataset Transforms
• Softmax And Cross Entropy
• Activation Functions
• Feed-Forward Neural Net
• Convolutional Neural Net (CNN)
• Transfer Learning
• Tensorboard
• Save and Load Models
〰️〰️〰️〰️〰️〰️〰️〰️
🔸 Engineering Python
Welcome to Engineering Python. This is a Python programming course for engineers.
This GitHub repository hosts the Jupyter Notebooks and Python source code for the open course on YouTube (http://youtube.com/yongtwang).
A tutorial on how to use these course materials is in this YouTube video: 02C Course Materials and Jupyter Notebook.
〰️〰️〰️〰️〰️〰️〰️〰️
🔸 Fun and useful projects with Python
You can find the corresponding tutorials on my channel: https://www.youtube.com/c/PythonEngineer
〰️〰️〰️〰️〰️〰️〰️〰️
🔸 Python Engineer Roadmap
Python can be used in a lot of computer science fields. In this repository, we have collected resources for each field of computer science that are related to Python.
〰️〰️〰️〰️〰️〰️〰️〰️
🔸 PyTorch Beginner Tutorials from my YouTube channel
• Installation
• Tensor Basics
• Autograd
• Backpropagation
• Gradient Descent With Autograd and Backpropagation
• Training Pipeline: Model, Loss, and Optimizer
• Linear Regression
• Logistic Regression
• Dataset and DataLoader
• Dataset Transforms
• Softmax And Cross Entropy
• Activation Functions
• Feed-Forward Neural Net
• Convolutional Neural Net (CNN)
• Transfer Learning
• Tensorboard
• Save and Load Models
❤1
😐 Docker's New Ultimatum Can Affect Open-Source Projects in a Big, Negative Way
https://news.itsfoss.com/docker-dropping-free-team-orgs/
https://news.itsfoss.com/docker-dropping-free-team-orgs/
It's FOSS News
Docker's New Ultimatum Can Affect Open-Source Projects in a Big, Negative Way
Docker can do better to accommodate open-source projects; what do you think?
Apache Airflow гайды:
▫️Руководство по использованию Apache Airflow от сбера
▫️GitHub -> GB: Настройка потоков данных. Apache Airflow
▫️Руководство по использованию Apache Airflow от сбера
▫️GitHub -> GB: Настройка потоков данных. Apache Airflow
Sber Developers Documentation
Документация для разработчиков
Руководство по использованию AirFlow (оркестратор для задач ETL) | Платформа данных Сбера (SberData Platform) – набор интегрированных сервисов работы с данными
Ловите гайд по Apache Superset Сбер 😁
https://developers.sber.ru/docs/ru/sdp/sdpanalytics/guidelines-reports-SDPBI
https://developers.sber.ru/docs/ru/sdp/sdpanalytics/guidelines-reports-SDPBI
Sber Developers Documentation
Документация для разработчиков
Руководство по разработке отчетов в SDP BI | Платформа данных Сбера (SberData Platform) – набор интегрированных сервисов работы с данными
❤2
Forwarded from Как мы делаем Яндекс
Яндекс выкладывает в опенсорс одну из основных инфраструктурных BigData-систем собственной разработки — YTsaurus. Это платформа, предназначенная для распределённого хранения и обработки больших данных.
Максим Бабенко, руководитель отдела технологий распределённых вычислений в Яндексе, рассказал историю возникновения YT, а также зачем нужна YTsaurus и где её можно применять.
В Github-репозитории — серверный код YTsaurus, инфраструктура развёртывания с использованием k8s, а также веб-интерфейс системы и клиентский SDK для распространённых языков программирования — C++, Java, Go и Python.
Ссылки на посты на Хабре и Медиуме.
Максим Бабенко, руководитель отдела технологий распределённых вычислений в Яндексе, рассказал историю возникновения YT, а также зачем нужна YTsaurus и где её можно применять.
В Github-репозитории — серверный код YTsaurus, инфраструктура развёртывания с использованием k8s, а также веб-интерфейс системы и клиентский SDK для распространённых языков программирования — C++, Java, Go и Python.
Ссылки на посты на Хабре и Медиуме.
👍4