Pandas vs PySpark DataFrame With Examples
Let’s learn the difference between Pandas vs PySpark DataFrame, their definitions, features, advantages, how to create them and transform one to another with Examples.
👉 @devops_dataops
https://sparkbyexamples.com/pyspark/pandas-vs-pyspark-dataframe-with-examples/
Let’s learn the difference between Pandas vs PySpark DataFrame, their definitions, features, advantages, how to create them and transform one to another with Examples.
👉 @devops_dataops
https://sparkbyexamples.com/pyspark/pandas-vs-pyspark-dataframe-with-examples/
Spark By {Examples}
Pandas vs PySpark DataFrame With Examples
What are the differences between Pandas and PySpark DataFrame? Pandas and PySpark are both powerful tools for data manipulation and analysis in Python.
Артем Шутак — Вставить в ClickHouse и не умереть
Презентация
Подборка ClickHouse + Spark - Altinity Knowledge Base
-----------
👉 @devops_dataops
Презентация
Подборка ClickHouse + Spark - Altinity Knowledge Base
-----------
👉 @devops_dataops
YouTube
Артем Шутак — Вставить в ClickHouse и не умереть
Подробнее о конференции SmartData: https://jrg.su/aTWU2K
— —
Казалось бы, что может быть проще, чем вставить данные в БД?! Но в Одноклассниках это делают 2 года и ClickHouse не перестает удивлять.
Артём Шутак из Одноклассников. Их инсталляция — это примерно…
— —
Казалось бы, что может быть проще, чем вставить данные в БД?! Но в Одноклассниках это делают 2 года и ClickHouse не перестает удивлять.
Артём Шутак из Одноклассников. Их инсталляция — это примерно…
🔥2
Github Actions - Введение в CI/CD
00:00 - О чем курс
03:50 - Github вводный курс
12:35 - Начало работы с Github Actions
18:20 - Пишем первый workflow
29:17 - Автоматически тестируем React
37:57 - Что такое Actions
48:25 - Усложняем workflow (практика)
53:40 - Зависимость job и их порядок
01:00:18 - Context & Events
01:21:19 - Добавление cache
01:28:13 - Matrix
01:35:44 - Artifacts
01:45:25 - Environment & Secrets
https://www.youtube.com/watch?v=e0A2hDObLmg
00:00 - О чем курс
03:50 - Github вводный курс
12:35 - Начало работы с Github Actions
18:20 - Пишем первый workflow
29:17 - Автоматически тестируем React
37:57 - Что такое Actions
48:25 - Усложняем workflow (практика)
53:40 - Зависимость job и их порядок
01:00:18 - Context & Events
01:21:19 - Добавление cache
01:28:13 - Matrix
01:35:44 - Artifacts
01:45:25 - Environment & Secrets
https://www.youtube.com/watch?v=e0A2hDObLmg
YouTube
Github Actions - Введение в CI/CD
Регистрируйтесь и создавайте надёжный кластер Облачных баз данных в Selectel с экономией в 30%: https://slc.tl/3qwoj
Статья и исходный код в моем ТГ канале. Подписывайтесь:
https://t.me/js_by_vladilen/556
Больше контента в моем Boosty: https://boosty.to/vladilen…
Статья и исходный код в моем ТГ канале. Подписывайтесь:
https://t.me/js_by_vladilen/556
Больше контента в моем Boosty: https://boosty.to/vladilen…
Интересная модель монетизации у этого софта, вроде опенсоурс, но и есть разумные плюшки, которые можно получить только в платной версии (пользователи и роли + поддержка).
Ну и сама идея появления платформ с low-code подходом как open-source тоже интересная.
----
Tooljet | Open-source low-code platform to build internal tools
Extensible low-code framework for building business applications. Connect to databases, cloud storages, GraphQL, API endpoints, Airtable, etc and build apps using drag and drop application builder. Built using JavaScript/TypeScript.
https://www.tooljet.com/
Ну и сама идея появления платформ с low-code подходом как open-source тоже интересная.
----
Tooljet | Open-source low-code platform to build internal tools
Extensible low-code framework for building business applications. Connect to databases, cloud storages, GraphQL, API endpoints, Airtable, etc and build apps using drag and drop application builder. Built using JavaScript/TypeScript.
https://www.tooljet.com/
www.tooljet.ai
ToolJet | AI-Native Platform for Building Internal Tools
Build AI-powered internal tools faster and easier with our AI-native platform,ToolJet. In minutes, you can create tools that are enterprise-ready.
Kubernetes Explained in 6 Minutes | k8s Architecture - YouTube
https://m.youtube.com/watch?v=TlHvYWVUZyc
https://m.youtube.com/watch?v=TlHvYWVUZyc
YouTube
Kubernetes Explained in 6 Minutes | k8s Architecture
To get better at system design, subscribe to our weekly newsletter: https://bit.ly/3tfAlYD
Checkout our bestselling System Design Interview books:
Volume 1: https://amzn.to/3Ou7gkd
Volume 2: https://amzn.to/3HqGozy
ABOUT US:
Covering topics and trends…
Checkout our bestselling System Design Interview books:
Volume 1: https://amzn.to/3Ou7gkd
Volume 2: https://amzn.to/3HqGozy
ABOUT US:
Covering topics and trends…
ChatGPT Tutorial for Developers - 38 Ways to 10x Your Productivity
https://www.youtube.com/watch?v=sTeoEFzVNSc
https://www.youtube.com/watch?v=sTeoEFzVNSc
YouTube
ChatGPT Tutorial for Developers - 38 Ways to 10x Your Productivity
Learn how to use ChatGPT to 10x your productivity! 38 examples using Python, JavaScript, HTML, CSS, React, SQL and more!
- Subscribe for more ChatGPT tutorials: https://goo.gl/6PYaGF
ChatGPT Desktop App: https://github.com/f/awesome-chatgpt-prompts
ChatGPT…
- Subscribe for more ChatGPT tutorials: https://goo.gl/6PYaGF
ChatGPT Desktop App: https://github.com/f/awesome-chatgpt-prompts
ChatGPT…
👍2
Data Build Tool (dbt). Transformation in Modern data stack | by Amit Singh Rathore | Jan, 2023 | Dev Genius
https://blog.devgenius.io/data-build-tool-dbt-1f0b03d97cc6
https://blog.devgenius.io/data-build-tool-dbt-1f0b03d97cc6
Medium
Data Build Tool (dbt)
Transformation in Modern data stack
Как запушить в Gitlab пакет npm, помогло 👍
Publishing your private npm packages to Gitlab NPM Registry
https://shivamarora.medium.com/publishing-your-private-npm-packages-to-gitlab-npm-registry-39d30a791085
Publishing your private npm packages to Gitlab NPM Registry
https://shivamarora.medium.com/publishing-your-private-npm-packages-to-gitlab-npm-registry-39d30a791085
Medium
Publishing your private npm packages to Gitlab NPM Registry
Configure npm, yarn, lerna to publish packages to Gitlab Package Registry and use them as dependencies in your project
Очередная подборочка инструментов Awesome-Selfhosted
A list of Free Software network services and web applications which can be hosted on your own servers
https://github.com/awesome-selfhosted/awesome-selfhosted
A list of Free Software network services and web applications which can be hosted on your own servers
https://github.com/awesome-selfhosted/awesome-selfhosted
GitHub
GitHub - awesome-selfhosted/awesome-selfhosted: A list of Free Software network services and web applications which can be hosted…
A list of Free Software network services and web applications which can be hosted on your own servers - awesome-selfhosted/awesome-selfhosted
👍1
Prescriber-ETL-data-pipeline
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports
https://github.com/judeleonard/Prescriber-ETL-data-pipeline
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and various data warehouse technologies and finally using Apache Superset to connect to DWH for generating BI dashboards for weekly reports
https://github.com/judeleonard/Prescriber-ETL-data-pipeline
GitHub
GitHub - judeleonard/Prescriber-ETL-data-pipeline: An End-to-End ETL data pipeline that leverages pyspark parallel processing to…
An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and ...
👍1
airflow-docker
This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows.
https://github.com/anilkulkarni87/airflow-docker
This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows.
https://github.com/anilkulkarni87/airflow-docker
GitHub
GitHub - anilkulkarni87/airflow-docker: This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose.…
This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows. - anilkulkarni87/airflow-docker
Неплохая сравнительная табличка по инструментам metadata management
Awesome Data Discovery and Observability
https://github.com/opendatadiscovery/awesome-data-catalogs
Awesome Data Discovery and Observability
https://github.com/opendatadiscovery/awesome-data-catalogs
GitHub
GitHub - opendatadiscovery/awesome-data-catalogs: 📙 Awesome Data Catalogs and Observability Platforms.
📙 Awesome Data Catalogs and Observability Platforms. - GitHub - opendatadiscovery/awesome-data-catalogs: 📙 Awesome Data Catalogs and Observability Platforms.
👍1
Примеры из курса про Apache Airflow 2.0
https://github.com/adilkhash/apache-airflow-course-materials
https://github.com/adilkhash/apache-airflow-course-materials
GitHub
GitHub - adilkhash/apache-airflow-course-materials: Курс про Apache Airflow 2.0
Курс про Apache Airflow 2.0. Contribute to adilkhash/apache-airflow-course-materials development by creating an account on GitHub.
❤1👍1🔥1
How to Orchestrate an ETL Data Pipeline with Apache Airflow
https://www.freecodecamp.org/news/orchestrate-an-etl-data-pipeline-with-apache-airflow/
https://www.freecodecamp.org/news/orchestrate-an-etl-data-pipeline-with-apache-airflow/
freeCodeCamp.org
How to Orchestrate an ETL Data Pipeline with Apache Airflow
By Aviator Ifeanyichukwu Data Orchestration involves using different tools and technologies together to extract, transform, and load (ETL) data from multiple sources into a central repository. Data orchestration typically involves a combination of t...
🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
OpenMetadata vs DataHub
Один из пунктов "Против" решения Datahub - это их раздражающий функционал открытия Data Lineage.
Почему нельзя сделать кнопку открытия всего дерева - для меня загадка.
Пока при сравнении OpenMetadata vs DataHub лидирует OpenMetadata продукт.
Один из пунктов "Против" решения Datahub - это их раздражающий функционал открытия Data Lineage.
Почему нельзя сделать кнопку открытия всего дерева - для меня загадка.
Пока при сравнении OpenMetadata vs DataHub лидирует OpenMetadata продукт.
👍1
Data Engineering with Python.pdf
10.5 MB
Data Engineering with Python
Packt Publishing
Key Features
▫️Become well-versed in data architectures, data preparation, and data optimization skills with the help of practical examples
▫️Design data models and learn how to extract, transform, and load (ETL) data using Python
▫️Schedule, automate, and monitor complex data pipelines in production
👉 @devops_dataops
Packt Publishing
Key Features
▫️Become well-versed in data architectures, data preparation, and data optimization skills with the help of practical examples
▫️Design data models and learn how to extract, transform, and load (ETL) data using Python
▫️Schedule, automate, and monitor complex data pipelines in production
👉 @devops_dataops
🔥3
Data Engineering - Open Source Tools/Databases
A curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.
Airflow
Cassandra
ClickHouse
Drill
Druid
ELK
Grafana-Prometheus
Hadoop
Kafka
LakeFS
Mariadb
Minio
Postgres
Redis
Spark
Superset
Trino
mongo
https://github.com/irbigdata/data-dockerfiles
A curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.
Airflow
Cassandra
ClickHouse
Drill
Druid
ELK
Grafana-Prometheus
Hadoop
Kafka
LakeFS
Mariadb
Minio
Postgres
Redis
Spark
Superset
Trino
mongo
https://github.com/irbigdata/data-dockerfiles
GitHub
GitHub - irbigdata/data-dockerfiles: a curated list of docker-compose files prepared for testing data engineering tools, databases…
a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries. - irbigdata/data-dockerfiles
Apache Druid in 5 minutes
https://youtu.be/X8ZnwwmCBAA
https://youtu.be/X8ZnwwmCBAA
YouTube
Apache Druid in 5 Minutes
Apache Druid is a real-time analytics database used by 1000s of companies like Netflix, Confluent, Salesforce, and Target. But what's the big deal? Why use Druid instead of a data warehouse - like Snowflake, BigQuery, or Redshift - or an operational database…