Data Engineers
8.82K subscribers
345 photos
74 files
337 links
Free Data Engineering Ebooks & Courses
Download Telegram
SQL vs Pyspark.pdf
462.2 KB
SQL vs Pyspark.pdf
๐Ÿ‘2
๐Ÿš€ Master SQL for Data Engineer and Ace Interviews

To succeed as a Data Analyst, focus on these essential SQL topics:

1๏ธโƒฃ Fundamental SQL Commands
SELECT, FROM, WHERE

GROUP BY, HAVING, LIMIT


2๏ธโƒฃ Advanced Querying Techniques
Joins: LEFT, RIGHT, INNER, SELF, CROSS

Aggregate Functions: SUM(), MAX(), MIN(), AVG()

Window Functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LEAD(), LAG(), SUM() OVER()

Conditional Logic & Pattern Matching:

CASE statements for conditions

LIKE for pattern matching


Complex Queries: Subqueries, Common Table Expressions (CTEs), temporary tables


3๏ธโƒฃ Performance Tuning
Optimize queries for better performance

Learn indexing strategies


4๏ธโƒฃ Practical Applications
Solve case studies from Ankit Bansal's YouTube channel

Watch 10-15 minute tutorials, practice along for hands-on learning


5๏ธโƒฃ End-to-End Projects
Search "Data Analysis End-to-End Projects Using SQL" on YouTube

Practice the full process: data extraction โžก๏ธ cleaning โžก๏ธ analysis


6๏ธโƒฃ Real-World Data Analysis
Analyze real datasets for insights

Practice cleaning, handling missing values, and dealing with outliers


7๏ธโƒฃ Advanced Data Manipulation
Use advanced SQL functions for transforming raw data into insights

Practice combining data from multiple sources


8๏ธโƒฃ Reporting & Dashboards
Build impactful reports and dashboards using SQL and Power BI


9๏ธโƒฃ Interview Preparation
Practice common SQL interview questions

Solve exercises and coding challenges


๐Ÿ”‘ Pro Tip: Hands-on practice is key! Apply these steps to real projects and datasets to strengthen your expertise and confidence.

#SQL #DataEngineer #CareerGrowth
๐Ÿ”ฅ Working with Intersect and Except in SQL

When dealing with datasets in SQL, you often need to find common records in two tables or determine the differences between them. For these purposes, SQL provides two useful operators: INTERSECT and EXCEPT. Letโ€™s take a closer look at how they work.

๐Ÿ”ป The INTERSECT Operator
The INTERSECT operator is used to find rows that are present in both queries. It works like the intersection of sets in mathematics, returning only those records that exist in both datasets.

Example:
SELECT column1, column2
FROM table1
INTERSECT
SELECT column1, column2
FROM table2;

This will return rows that appear in both table1 and table2.

Key Points:
- The INTERSECT operator automatically removes duplicate rows from the result.
- The selected columns must have compatible data types.

๐Ÿ”ป The EXCEPT Operator
The EXCEPT operator is used to find rows that are present in the first query but not in the second. This is similar to the difference between sets, returning only those records that exist in the first dataset but are missing from the second.

Example:
SELECT column1, column2
FROM table1
EXCEPT
SELECT column1, column2
FROM table2;

Here, the result will include rows that are in table1 but not in table2.

Key Points:
- The EXCEPT operator also removes duplicate rows from the result.
- As with INTERSECT, the columns must have compatible data types.

๐Ÿ“Š Whatโ€™s the Difference Between UNION, INTERSECT, and EXCEPT?
- UNION combines all rows from both queries, excluding duplicates.
- INTERSECT returns only the rows present in both queries.
- EXCEPT returns rows from the first query that are not found in the second.

๐Ÿ“Œ Real-Life Examples
1. Finding common customers. Use INTERSECT to identify customers who have made purchases both online and in physical stores.
2. Determining unique products. Use EXCEPT to find products that are sold in one store but not in another.

By using INTERSECT and EXCEPT, you can simplify data analysis and work more flexibly with sets, making it easier to solve tasks related to finding intersections and differences between datasets.

Happy querying!
๐Ÿ‘9
Life of a Data Engineer.....


Business user : Can we add a filter on this dashboard. This will help us track a critical metric.
me : sure this should be a quick one.

Next day :

I quickly opened the dashboard to find the column in the existing dashboard's data sources. -- column not found

Spent a couple of hours to identify the data source and how to bring the column into the existence data pipeline which feeds the dashboard( table granularity , join condition etc..).

Then comes the pipeline changes , data model changes , dashboard changes , validation/testing.

Finally deploying to production and a simple email to the user that the filter has been added.

A small change in the front end but a lot of work in the backend to bring that column to life.

Never underestimate data engineers and data pipelines ๐Ÿ’ช
๐Ÿ‘11โค4
Data Engineering Roadmap
๐Ÿ‘12
Quick comparison
Data Science Libraries
SQL Mindmap
Data Science Packages
๐Ÿ”ฅ4๐Ÿ‘2โค1
Life of a Data Engineer.....


Business user : Can we add a filter on this dashboard. This will help us track a critical metric.
me : sure this should be a quick one.

Next day :

I quickly opened the dashboard to find the column in the existing dashboard's data sources.  -- column not found

Spent a couple of hours to identify the data source and how to bring the column into the existence data pipeline which feeds the dashboard( table granularity , join condition etc..).

Then comes the pipeline changes , data model changes , dashboard changes , validation/testing.

Finally deploying to production and a simple email to the user that the filter has been added.

A small change in the front end but a lot of work in the backend to bring that column to life.

Never underestimate data engineers and data pipelines ๐Ÿ’ช
โค1
These are the Top 5 Most Common SQL Questions for Data Engineering:


1. Total records after joining two tables on all types of joins
2. Rolling Sum and Nth salary based questions
3. Lag/Lead based questions e.g., consecutive months of increasing sales or YoY growth
4. Query to find employees who earn more than their managers
5. Removing duplicates from a table


Key Takeaways:
- Master window functions and joins
- Practice medium to hard SQL questions regularly

Getting good at SQL will pay off in the long run! ๐Ÿ’ช

Join our WhatsApp channel of Data Engineers: https://whatsapp.com/channel/0029Vaovs0ZKbYMKXvKRYi3C
FREE RESOURCES TO LEARN DATA ENGINEERING
๐Ÿ‘‡๐Ÿ‘‡

Big Data and Hadoop Essentials free course

https://bit.ly/3rLxbul

Data Engineer: Prepare Financial Data for ML and Backtesting FREE UDEMY COURSE
[4.6 stars out of 5]

https://bit.ly/3fGRjLu

Understanding Data Engineering from Datacamp

https://clnk.in/soLY

Data Engineering Free Books

https://ia600201.us.archive.org/4/items/springer_10.1007-978-1-4419-0176-7/10.1007-978-1-4419-0176-7.pdf

https://www.darwinpricing.com/training/Data_Engineering_Cookbook.pdf

Big Data of Data Engineering Free book

https://databricks.com/wp-content/uploads/2021/10/Big-Book-of-Data-Engineering-Final.pdf

https://aimlcommunity.com/wp-content/uploads/2019/09/Data-Engineering.pdf

The Data Engineerโ€™s Guide to Apache Spark

https://t.me/datasciencefun/783?single

Data Engineering with Python

https://t.me/pythondevelopersindia/343

Data Engineering Projects -

1.End-To-End From Web Scraping to Tableau  https://lnkd.in/ePMw63ge

2. Building Data Model and Writing ETL Job https://lnkd.in/eq-e3_3J

3. Data Modeling and Analysis using Semantic Web Technologies https://lnkd.in/e4A86Ypq

4. ETL Project in Azure Data Factory - https://lnkd.in/eP8huQW3

5. ETL Pipeline on AWS Cloud - https://lnkd.in/ebgNtNRR

6. Covid Data Analysis Project - https://lnkd.in/eWZ3JfKD

7. YouTube Data Analysis 
   (End-To-End Data Engineering Project) - https://lnkd.in/eYJTEKwF

8. Twitter Data Pipeline using Airflow - https://lnkd.in/eNxHHZbY

9. Sentiment analysis Twitter:
    Kafka and Spark Structured Streaming -  https://lnkd.in/esVAaqtU

ENJOY LEARNING ๐Ÿ‘๐Ÿ‘
โค2๐Ÿ‘2
๐—š๐—ผ๐—ผ๐—ด๐—น๐—ฒ ๐—™๐—ฅ๐—˜๐—˜ ๐—–๐—ฒ๐—ฟ๐˜๐—ถ๐—ณ๐—ถ๐—ฐ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ๐˜€๐Ÿ˜ 

Data analytics is a must-have skill in todayโ€™s digital era, and Google offers exceptional free courses to help you excel

- Google Analytics Certification
- Google Analytics for Power Users
- Advanced Google Analytics

๐‹๐ข๐ง๐ค ๐Ÿ‘‡:- 

https://pdlink.in/423LMom

Enroll For FREE & Get Certified๐ŸŽ“
Tools for Data Engineers ๐Ÿ‘†
๐Ÿ”ฅ4๐Ÿ‘2
Languages used by data engineers:

๐Ÿ“SQL
๐Ÿ“Python
๐Ÿ“Scala
๐Ÿ“Pyspark
๐Ÿ“Spark SQL
๐Ÿ”ฅ1
Here are some incredible platforms where you can download datasets for your project:


Our World in Data https://ourworldindata.org/

World Health Organization (https://www.who.int/data/gho

Statcounter (https://gs.statcounter.com/

Food and Agriculture Organization of the UN (FAO) (https://www.fao.org/home/en

World Bank (https://data.worldbank.org/)
๐—š๐—ฒ๐˜ ๐—ฌ๐—ผ๐˜‚๐—ฟ ๐——๐—ฟ๐—ฒ๐—ฎ๐—บ ๐—๐—ผ๐—ฏ ๐—œ๐—ป ๐—”๐—บ๐—ฎ๐˜‡๐—ผ๐—ป, ๐—š๐—ผ๐—ผ๐—ด๐—น๐—ฒ, ๐— ๐—ถ๐—ฐ๐—ฟ๐—ผ๐˜€๐—ผ๐—ณ๐˜, ๐—ก๐—ฉ๐—œ๐——๐—œ๐—”, ๐—ฎ๐—ป๐—ฑ ๐— ๐—ฒ๐˜๐—ฎ (๐—™๐—ฎ๐—ฐ๐—ฒ๐—ฏ๐—ผ๐—ผ๐—ธ) ๐˜„๐—ถ๐˜๐—ต ๐˜๐—ต๐—ฒ๐˜€๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ฟ๐—ฒ๐—ต๐—ฒ๐—ป๐˜€๐—ถ๐˜ƒ๐—ฒ ๐—ฟ๐—ฒ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ๐˜€๐Ÿ˜

1๏ธโƒฃ Amazon Interviewing Guide
2๏ธโƒฃ Google Interview Tips
3๏ธโƒฃ Microsoft Hiring Tips
4๏ธโƒฃ NVIDIA Hiring Process
5๏ธโƒฃ Meta Onsite SWE Prep Guide

๐‹๐ข๐ง๐ค๐Ÿ‘‡:-

https://pdlink.in/40OSJJ6

Crack Interview & Get Your Dream Job In Top MNCs
Flow chart of commonly used statistical tests
๐Ÿ”ฅ2
๐…๐‘๐„๐„ ๐‚๐ž๐ซ๐ญ๐ข๐Ÿ๐ข๐œ๐š๐ญ๐ข๐จ๐ง ๐‚๐จ๐ฎ๐ซ๐ฌ๐ž๐ฌ ๐Ÿ˜

1) Generative AI

2) Big data artificial intelligence

3 ) Microsoft Al for beginners

4) Prompt Engineering for Chat GPT

๐‹๐ข๐ง๐ค๐Ÿ‘‡ :- 

https://pdlink.in/40Fbg9d

Enroll For FREE & Get Certified๐ŸŽ“
โค1