Tech Rumors
3 subscribers
235K photos
239K links
Download Telegram
dev.to

1. PySpark: A brief analysis to the most common words in Dracula, by Bram Stoker
Note: this article is also available in portuguese 🌎.

A landmark in Gothic literature, the iconic novel Dracula, written by Bram Stoker in 1897, stirs the emotions of people across the world. Today, to introduce Spark's new concepts and features, we will develop a brief notebook to analyze the most common words in this classic book 🧛🏼‍♂️.

To do this, we will write a notebook in Google Colab, a cloud service built by Google to encourage machine learning and artificial intelligence researches…
#python #dataengineering #spark #datascience

2. Async Streams in WebAssembly with WasmRS
TL;DR: WasmRS is an implementation of RSocket for WebAssembly giving you reactive, async streams in and out of WASM modules.
GitHub | Protocol details | Rust source | Go source

WebAssembly has immense potential but it is hardly user-friendly. It's making strides but what we have to work with today is a slog. Baseline WebAssembly only works with integers and floating point values. Your typical "Hello world" is a cumbersome mess of reading raw memory and dealing with bytes. Once you figure out ho…
#webassembly #rust #rx #streaming
dev.to

1. Building a Data Lakehouse for Analyzing Elon Musk Tweets using MinIO, Apache Airflow, Apache Drill and Apache Superset
Every act of conscious learning requires the willingness to suffer an injury to one's self-esteem. That is why young children, before they are aware of their own self-importance, learn so easily.
Thomas Szasz

Motivation

A Data Lakehouse is a modern data architecture that combines the scalability and flexibility of a data lake with the governance and performance of a data warehouse. This approach allows organizations to store and analyze large amounts of structured and unstructured data in a s…
#dataengineering #datascience #python #database

2. How to Make a Color Picker in JavaScript?
To create an image color picker using JavaScript, CSS, and HTML, you will need to create an HTML file that includes an image and a color picker element (such as a color input or a color wheel).

You can use CSS to style the image and the color picker. Finally, you can use JavaScript to attach an event listener to the color picker element, so that when the user selects a color, it updates the image's color accordingly.

HTML Code

Here is a sample HTML structure for the image color picker:

id…
#webdev #javascript #tutorial #html
dev.to

1. Web Scraping With Puppeteer for Total Noobs
Web scraping is something I never thought I'd do. I'm primarily a UI developer, although my career started as a backend developer. I've never been asked at a job to perform any scraping tasks, and I'd never had a personal project that required me to scrape some data until recently. I'll share what I've learned, which is honestly probably just scratching the surface of what you can do with a technology like Puppeteer. In this post I'll walk you through installing and writing a Node.js script that…
#javascript #beginners #tutorial #webdev

2. Data Pipelines with Great Expectations | Introduction
My background is mostly in web development, but I am learning data engineering because I am interested in business intelligence and want to level-up my knowledge. One of the things that I have recently discovered is data validation within a data pipeline. Let me share a quick example from a previous job I had that illustrates the need for data validation in a data pipeline.

Example of a faulty data pipeline

We had a web app that allowed our clients to log into an account and view performance d…
#datascience #dataengineering #datavalidation #greatexpectations

3. A Comprehensive Guide to Cloud Migration from On-Prem Datacenter to AWS
Cloud migration is the process of moving data, applications, or other business elements from an on-premise environment to a cloud-based environment. Migrating to Amazon Web Services (AWS) is a popular choice for organizations looking to take advantage of the benefits of cloud computing.

Phases of Cloud Migration

Assessment: In this phase, organizations conduct a thorough assessment of their existing on-premise infrastructure and identify the elements that need to be migrated to AWS. The asses…
#aws #cloudmigration #cloud #migration
dev.to

Important Questions related to Data Engineering
Can a database be used as DWH? Yes, Database can be used as a Data Warehouse. Even before the data...
#data #dataengineering #pipelines #questions
dev.to

Structure Query Language
What is SQL ? SQL (Structured Query Language) is a programming language used to communicate with...
#sql #data #dataengineering #blog
dev.to

Mage Battlegrounds: Craft insights from real-time customer behavior analysis
You're invited to participate in the very first Mage Battlegrounds: Craft insights from real-time...
#dataengineering #hackathon #mage #mageai
dev.to

Unleashing the Magic of Job Schedulers: How to Tame Your Code and Save Your Sanity
Once upon a time, in a land far, far away, software engineers were manually running their code on...
#beginners #webdev #dataengineering #schedule
dev.to

AWS for Data Engineering Projects
In the era of big data, organizations are constantly seeking new ways to manage and analyze massive...
#aws #dataengineering #cloud #imsampro
dev.to

Data sources episode 2: AWS S3 to Postgres Data Sync using Singer
TLDR AWS S3 is a widely used option for data lake, let’s see how Singer helps Data...
#dataengineering #community #awss3 #singer
dev.to

Apache Flink episode 1: A comprehensive introduction
TLDR Apache Flink is a powerful open-source stream processing framework for big data,...
#dataengineering #apacheflink #data #opensource
dev.to

Python Data Lake Management: Complete Guide with Delta Lake, Apache Arrow, and PySpark
Learn Python data lake management with practical examples. Explore Delta Lake, Apache Arrow, Dask, and PySpark implementations for efficient data handling. Includes code samples and best practices. #Python #DataEngineering
dev.to

Real-Time Data Processing with PyFlink and Redpanda
Just leveled up my #DataEngineering skills by building real-time data pipelines with PyFlink and...
dev.to

🚀 Launching my Dev.to journey! 🌟
Hi everyone! I’m Ashish, a web developer diving into the world of #DataEngineering. Expect posts...