Top 10 Sites to review your resume for free:
1. Zety Resume Builder
2. Resumonk
3. Free Resume Builder
4. VisualCV
5. Cvmaker
6. ResumUP
7. Resume Genius
8. Resumebuilder
9. Resume Baking
10. Enhancv
1. Zety Resume Builder
2. Resumonk
3. Free Resume Builder
4. VisualCV
5. Cvmaker
6. ResumUP
7. Resume Genius
8. Resumebuilder
9. Resume Baking
10. Enhancv
❤1
COMMON TERMINOLOGIES IN PYTHON - PART 1
Have you ever gotten into a discussion with a programmer before? Did you find some of the Terminologies mentioned strange or you didn't fully understand them?
In this series, we would be looking at the common Terminologies in python.
It is important to know these Terminologies to be able to professionally/properly explain your codes to people and/or to be able to understand what people say in an instant when these codes are mentioned. Below are a few:
IDLE (Integrated Development and Learning Environment) - this is an environment that allows you to easily write Python code. IDLE can be used to execute a single statements and create, modify, and execute Python scripts.
Python Shell - This is the interactive environment that allows you to type in python code and execute them immediately
System Python - This is the version of python that comes with your operating system
Prompt - usually represented by the symbol ">>>" and it simply means that python is waiting for you to give it some instructions
REPL (Read-Evaluate-Print-Loop) - this refers to the sequence of events in your interactive window in form of a loop (python reads the code inputted>the code is evaluated>output is printed)
Argument - this is a value that is passed to a function when called eg print("Hello World")... "Hello World" is the argument that is being passed.
Function - this is a code that takes some input, known as arguments, processes that input and produces an output called a return value. E.g print("Hello World")... print is the function
Return Value - this is the value that a function returns to the calling script or function when it completes its task (in other words, Output). E.g.
>>> print("Hello World")
Hello World
Where Hello World is your return value.
Note: A return value can be any of these variable types: handle, integer, object, or string
Script - This is a file where you store your python code in a text file and execute all of the code with a single command
Script files - this is a file containing a group of python scripts
Have you ever gotten into a discussion with a programmer before? Did you find some of the Terminologies mentioned strange or you didn't fully understand them?
In this series, we would be looking at the common Terminologies in python.
It is important to know these Terminologies to be able to professionally/properly explain your codes to people and/or to be able to understand what people say in an instant when these codes are mentioned. Below are a few:
IDLE (Integrated Development and Learning Environment) - this is an environment that allows you to easily write Python code. IDLE can be used to execute a single statements and create, modify, and execute Python scripts.
Python Shell - This is the interactive environment that allows you to type in python code and execute them immediately
System Python - This is the version of python that comes with your operating system
Prompt - usually represented by the symbol ">>>" and it simply means that python is waiting for you to give it some instructions
REPL (Read-Evaluate-Print-Loop) - this refers to the sequence of events in your interactive window in form of a loop (python reads the code inputted>the code is evaluated>output is printed)
Argument - this is a value that is passed to a function when called eg print("Hello World")... "Hello World" is the argument that is being passed.
Function - this is a code that takes some input, known as arguments, processes that input and produces an output called a return value. E.g print("Hello World")... print is the function
Return Value - this is the value that a function returns to the calling script or function when it completes its task (in other words, Output). E.g.
>>> print("Hello World")
Hello World
Where Hello World is your return value.
Note: A return value can be any of these variable types: handle, integer, object, or string
Script - This is a file where you store your python code in a text file and execute all of the code with a single command
Script files - this is a file containing a group of python scripts
👍1
How long are coding interviews?
The phone screen portion of the coding interview typically lasts up to one hour. The second, more technical part of the interview can take multiple hours.
Where can I practice coding?
There are many ways to practice coding and prepare for your coding interview. LeetCode provides practice opportunities in more than 14 languages and more than 1,500 sample problems. Applicants can also practice their coding skills and interview prep with HackerRank.
How do I know if my coding interview went well?
There are a variety of indicators that your coding interview went well. These may include going over the allotted time, being introduced to additional team members, and receiving a quick response to your thank you email.
The phone screen portion of the coding interview typically lasts up to one hour. The second, more technical part of the interview can take multiple hours.
Where can I practice coding?
There are many ways to practice coding and prepare for your coding interview. LeetCode provides practice opportunities in more than 14 languages and more than 1,500 sample problems. Applicants can also practice their coding skills and interview prep with HackerRank.
How do I know if my coding interview went well?
There are a variety of indicators that your coding interview went well. These may include going over the allotted time, being introduced to additional team members, and receiving a quick response to your thank you email.
In class, there are some students who are really good at coding from the start, and seeing them can make us feel quite demotivated, especially since they often appear overconfident.
But it's not important how much someone already knows. If you start and practice consistently, it's not that tough to match their level or even surpass them.
And often, these overconfident people don’t perform as well as you can because you have the desire to learn, while they think they already know everything.
So, my friend, don’t get demotivated—just give it time!
But it's not important how much someone already knows. If you start and practice consistently, it's not that tough to match their level or even surpass them.
And often, these overconfident people don’t perform as well as you can because you have the desire to learn, while they think they already know everything.
So, my friend, don’t get demotivated—just give it time!
When you're studying DSA, you probably think, "This won't be directly used in the actual work of a company, so why am I even doing this?"
And in life, where will this even come in handy? Well, it won't be useful directly, but the hard work you're putting in—sitting day and night solving questions—that habit of working hard will pay off.
It's not really about DSA, but about the effort you're willing to give that will decide which company you land your internship or placement in ❤️
And in life, where will this even come in handy? Well, it won't be useful directly, but the hard work you're putting in—sitting day and night solving questions—that habit of working hard will pay off.
It's not really about DSA, but about the effort you're willing to give that will decide which company you land your internship or placement in ❤️
DSA question by understanding patterns
If the input array is sorted then
- Binary search
- Two pointers
If asked for all permutations/subsets then
- Backtracking
If given a tree then
- DFS
- BFS
If given a graph then
- DFS
- BFS
If given a linked list then
- Two pointers
If recursion is banned then
- Stack
If must solve in-place then
- Swap corresponding values
- Store one or more different values in the same pointer
If asked for maximum/minimum subarray/ subset/options then
- Dynamic programming
If asked for top/least K items then
- Heap
- QuickSelect
If asked for common strings then
- Map
- Trie
Else
- Map/Set for O(1) time & O(n) space
- Sort input for O(nlogn) time and O(1) space
If the input array is sorted then
- Binary search
- Two pointers
If asked for all permutations/subsets then
- Backtracking
If given a tree then
- DFS
- BFS
If given a graph then
- DFS
- BFS
If given a linked list then
- Two pointers
If recursion is banned then
- Stack
If must solve in-place then
- Swap corresponding values
- Store one or more different values in the same pointer
If asked for maximum/minimum subarray/ subset/options then
- Dynamic programming
If asked for top/least K items then
- Heap
- QuickSelect
If asked for common strings then
- Map
- Trie
Else
- Map/Set for O(1) time & O(n) space
- Sort input for O(nlogn) time and O(1) space
List of most asked Programming Interview Questions.
Are you preparing for a coding interview? This tweet is for you. It contains a list of the most asked interview questions from each topic.
Arrays
- How is an array sorted using quicksort?
- How do you reverse an array?
- How do you remove duplicates from an array?
- How do you find the 2nd largest number in an unsorted integer array?
Linked Lists
- How do you find the length of a linked list?
- How do you reverse a linked list?
- How do you find the third node from the end?
- How are duplicate nodes removed in an unsorted linked list?
Strings
- How do you check if a string contains only digits?
- How can a given string be reversed?
- How do you find the first non-repeated character?
- How do you find duplicate characters in strings?
Binary Trees
- How are all leaves of a binary tree printed?
- How do you check if a tree is a binary search tree?
- How is a binary search tree implemented?
- Find the lowest common ancestor in a binary tree?
Graph
- How to detect a cycle in a directed graph?
- How to detect a cycle in an undirected graph?
- Find the total number of strongly connected components?
- Find whether a path exists between two nodes of a graph?
- Find the minimum number of swaps required to sort an array.
Dynamic Programming
1. Find the longest common subsequence?
2. Find the longest common substring?
3. Coin change problem?
4. Box stacking problem?
5. Count the number of ways to cover a distance?
Are you preparing for a coding interview? This tweet is for you. It contains a list of the most asked interview questions from each topic.
Arrays
- How is an array sorted using quicksort?
- How do you reverse an array?
- How do you remove duplicates from an array?
- How do you find the 2nd largest number in an unsorted integer array?
Linked Lists
- How do you find the length of a linked list?
- How do you reverse a linked list?
- How do you find the third node from the end?
- How are duplicate nodes removed in an unsorted linked list?
Strings
- How do you check if a string contains only digits?
- How can a given string be reversed?
- How do you find the first non-repeated character?
- How do you find duplicate characters in strings?
Binary Trees
- How are all leaves of a binary tree printed?
- How do you check if a tree is a binary search tree?
- How is a binary search tree implemented?
- Find the lowest common ancestor in a binary tree?
Graph
- How to detect a cycle in a directed graph?
- How to detect a cycle in an undirected graph?
- Find the total number of strongly connected components?
- Find whether a path exists between two nodes of a graph?
- Find the minimum number of swaps required to sort an array.
Dynamic Programming
1. Find the longest common subsequence?
2. Find the longest common substring?
3. Coin change problem?
4. Box stacking problem?
5. Count the number of ways to cover a distance?
20 recently asked 𝗞𝗔𝗙𝗞𝗔 interview questions.
- How do you create a topic in Kafka using the Confluent CLI?
- Explain the role of the Schema Registry in Kafka.
- How do you register a new schema in the Schema Registry?
- What is the importance of key-value messages in Kafka?
- Describe a scenario where using a random key for messages is beneficial.
- Provide an example where using a constant key for messages is necessary.
- Write a simple Kafka producer code that sends JSON messages to a topic.
- How do you serialize a custom object before sending it to a Kafka topic?
- Describe how you can handle serialization errors in Kafka producers.
- Write a Kafka consumer code that reads messages from a topic and deserializes them from JSON.
- How do you handle deserialization errors in Kafka consumers?
- Explain the process of deserializing messages into custom objects.
- What is a consumer group in Kafka, and why is it important?
- Describe a scenario where multiple consumer groups are used for a single topic.
- How does Kafka ensure load balancing among consumers in a group?
- How do you send JSON data to a Kafka topic and ensure it is properly serialized?
- Describe the process of consuming JSON data from a Kafka topic and converting it to a usable format.
- Explain how you can work with CSV data in Kafka, including serialization and deserialization.
- Write a Kafka producer code snippet that sends CSV data to a topic.
- Write a Kafka consumer code snippet that reads and processes CSV data from a topic.
All the best 👍🏻👍🏻
- How do you create a topic in Kafka using the Confluent CLI?
- Explain the role of the Schema Registry in Kafka.
- How do you register a new schema in the Schema Registry?
- What is the importance of key-value messages in Kafka?
- Describe a scenario where using a random key for messages is beneficial.
- Provide an example where using a constant key for messages is necessary.
- Write a simple Kafka producer code that sends JSON messages to a topic.
- How do you serialize a custom object before sending it to a Kafka topic?
- Describe how you can handle serialization errors in Kafka producers.
- Write a Kafka consumer code that reads messages from a topic and deserializes them from JSON.
- How do you handle deserialization errors in Kafka consumers?
- Explain the process of deserializing messages into custom objects.
- What is a consumer group in Kafka, and why is it important?
- Describe a scenario where multiple consumer groups are used for a single topic.
- How does Kafka ensure load balancing among consumers in a group?
- How do you send JSON data to a Kafka topic and ensure it is properly serialized?
- Describe the process of consuming JSON data from a Kafka topic and converting it to a usable format.
- Explain how you can work with CSV data in Kafka, including serialization and deserialization.
- Write a Kafka producer code snippet that sends CSV data to a topic.
- Write a Kafka consumer code snippet that reads and processes CSV data from a topic.
All the best 👍🏻👍🏻
Important Data Engineering Concepts for Interviews
1. ETL Processes: Understand the ETL (Extract, Transform, Load) process, including how to design and implement efficient pipelines to move data from various sources to a data warehouse or data lake. Familiarize yourself with tools like Apache NiFi, Talend, and AWS Glue.
2. Data Warehousing: Know the fundamentals of data warehousing, including the star schema, snowflake schema, and how to design a data warehouse that supports efficient querying and reporting. Learn about popular data warehousing solutions like Amazon Redshift, Google BigQuery, and Snowflake.
3. Data Modeling: Master data modeling concepts, including normalization and denormalization, to design databases that are optimized for both read and write operations. Understand entity-relationship (ER) diagrams and how to use them to model data relationships.
4. Big Data Technologies: Gain expertise in big data frameworks like Apache Hadoop and Apache Spark for processing large datasets. Understand the roles of HDFS, MapReduce, Hive, and Pig in the Hadoop ecosystem, and how Spark’s in-memory processing can accelerate data processing.
5. Data Lakes: Learn about data lakes as a storage solution for raw, unstructured, and semi-structured data. Understand the key differences between data lakes and data warehouses, and how to use tools like Apache Hudi and Delta Lake to manage data lakes efficiently.
6. SQL and NoSQL Databases: Be proficient in SQL for querying and managing relational databases like MySQL, PostgreSQL, and Oracle. Also, understand when and how to use NoSQL databases like MongoDB, Cassandra, and DynamoDB for storing and querying unstructured or semi-structured data.
7. Data Pipelines: Learn how to design, build, and manage data pipelines that automate the flow of data from source systems to target destinations. Familiarize yourself with orchestration tools like Apache Airflow, Luigi, and Prefect for managing complex workflows.
8. APIs and Data Integration: Understand how to integrate data from various APIs and third-party services into your data pipelines. Learn about RESTful APIs, GraphQL, and how to handle data ingestion from external sources securely and efficiently.
9. Data Streaming: Gain knowledge of real-time data processing using streaming technologies like Apache Kafka, Apache Flink, and Amazon Kinesis. Learn how to build systems that can process and analyze data in real time as it flows through the system.
10. Cloud Platforms: Get familiar with cloud-based data engineering services offered by AWS, Azure, and Google Cloud. Understand how to use services like AWS S3, Azure Data Lake, Google Cloud Storage, AWS Redshift, and BigQuery for data storage, processing, and analysis.
11. Data Governance and Security: Learn best practices for data governance, including how to implement data quality checks, lineage tracking, and metadata management. Understand data security concepts like encryption, access control, and GDPR compliance to protect sensitive data.
12. Automation and Scripting: Be proficient in scripting languages like Python, Bash, or PowerShell to automate repetitive tasks, manage data pipelines, and perform ad-hoc data processing.
13. Data Versioning and Lineage: Understand the importance of data versioning and lineage for tracking changes to data over time. Learn how to use tools like Apache Atlas or DataHub for managing metadata and ensuring traceability in your data pipelines.
14. Containerization and Orchestration: Learn how to deploy and manage data engineering workloads using containerization tools like Docker and orchestration platforms like Kubernetes. Understand the benefits of using containers for scaling and maintaining consistency across environments.
15. Monitoring and Logging: Implement logging for data pipelines to ensure they run smoothly and efficiently. Familiarize yourself with tools like Prometheus, Grafana, etc. for real-time monitoring and troubleshooting.
1. ETL Processes: Understand the ETL (Extract, Transform, Load) process, including how to design and implement efficient pipelines to move data from various sources to a data warehouse or data lake. Familiarize yourself with tools like Apache NiFi, Talend, and AWS Glue.
2. Data Warehousing: Know the fundamentals of data warehousing, including the star schema, snowflake schema, and how to design a data warehouse that supports efficient querying and reporting. Learn about popular data warehousing solutions like Amazon Redshift, Google BigQuery, and Snowflake.
3. Data Modeling: Master data modeling concepts, including normalization and denormalization, to design databases that are optimized for both read and write operations. Understand entity-relationship (ER) diagrams and how to use them to model data relationships.
4. Big Data Technologies: Gain expertise in big data frameworks like Apache Hadoop and Apache Spark for processing large datasets. Understand the roles of HDFS, MapReduce, Hive, and Pig in the Hadoop ecosystem, and how Spark’s in-memory processing can accelerate data processing.
5. Data Lakes: Learn about data lakes as a storage solution for raw, unstructured, and semi-structured data. Understand the key differences between data lakes and data warehouses, and how to use tools like Apache Hudi and Delta Lake to manage data lakes efficiently.
6. SQL and NoSQL Databases: Be proficient in SQL for querying and managing relational databases like MySQL, PostgreSQL, and Oracle. Also, understand when and how to use NoSQL databases like MongoDB, Cassandra, and DynamoDB for storing and querying unstructured or semi-structured data.
7. Data Pipelines: Learn how to design, build, and manage data pipelines that automate the flow of data from source systems to target destinations. Familiarize yourself with orchestration tools like Apache Airflow, Luigi, and Prefect for managing complex workflows.
8. APIs and Data Integration: Understand how to integrate data from various APIs and third-party services into your data pipelines. Learn about RESTful APIs, GraphQL, and how to handle data ingestion from external sources securely and efficiently.
9. Data Streaming: Gain knowledge of real-time data processing using streaming technologies like Apache Kafka, Apache Flink, and Amazon Kinesis. Learn how to build systems that can process and analyze data in real time as it flows through the system.
10. Cloud Platforms: Get familiar with cloud-based data engineering services offered by AWS, Azure, and Google Cloud. Understand how to use services like AWS S3, Azure Data Lake, Google Cloud Storage, AWS Redshift, and BigQuery for data storage, processing, and analysis.
11. Data Governance and Security: Learn best practices for data governance, including how to implement data quality checks, lineage tracking, and metadata management. Understand data security concepts like encryption, access control, and GDPR compliance to protect sensitive data.
12. Automation and Scripting: Be proficient in scripting languages like Python, Bash, or PowerShell to automate repetitive tasks, manage data pipelines, and perform ad-hoc data processing.
13. Data Versioning and Lineage: Understand the importance of data versioning and lineage for tracking changes to data over time. Learn how to use tools like Apache Atlas or DataHub for managing metadata and ensuring traceability in your data pipelines.
14. Containerization and Orchestration: Learn how to deploy and manage data engineering workloads using containerization tools like Docker and orchestration platforms like Kubernetes. Understand the benefits of using containers for scaling and maintaining consistency across environments.
15. Monitoring and Logging: Implement logging for data pipelines to ensure they run smoothly and efficiently. Familiarize yourself with tools like Prometheus, Grafana, etc. for real-time monitoring and troubleshooting.
Pyspark Interview Questions!!
Interviewer: "How would you remove duplicates from a large dataset in PySpark?"
Candidate: "To remove duplicates from a large dataset in PySpark, I would follow these steps:
Step 1: Load the dataset into a DataFrame
Step 2: Check for duplicates
Step 3: Partition the data to optimize performance
Step 4: Remove duplicates using the
Step 5: Cache the resulting DataFrame to avoid recomputing
Step 6: Save the cleaned dataset
Interviewer: "That's correct! Can you explain why you partitioned the data in Step 3?"
Candidate: "Yes, partitioning the data helps to distribute the computation across multiple nodes, making the process more efficient and scalable."
Interviewer: "Great answer! Can you also explain why you cached the resulting DataFrame in Step 5?"
Candidate: "Caching the DataFrame avoids recomputing the entire dataset when saving the cleaned data, which can significantly improve performance."
Interviewer: "Excellent! You have demonstrated a clear understanding of optimizing duplicate removal in PySpark."
Interviewer: "How would you remove duplicates from a large dataset in PySpark?"
Candidate: "To remove duplicates from a large dataset in PySpark, I would follow these steps:
Step 1: Load the dataset into a DataFrame
df = spark.read.csv("path/to/data.csv", header=True, inferSchema=True)Step 2: Check for duplicates
duplicate_count = df.count() - df.dropDuplicates().count()
print(f"Number of duplicates: {duplicate_count}")
Step 3: Partition the data to optimize performance
df_repartitioned = df.repartition(100)Step 4: Remove duplicates using the
dropDuplicates() methoddf_no_duplicates = df_repartitioned.dropDuplicates()Step 5: Cache the resulting DataFrame to avoid recomputing
df_no_duplicates.cache()Step 6: Save the cleaned dataset
df_no_duplicates.write.csv("path/to/cleaned/data.csv", header=True)Interviewer: "That's correct! Can you explain why you partitioned the data in Step 3?"
Candidate: "Yes, partitioning the data helps to distribute the computation across multiple nodes, making the process more efficient and scalable."
Interviewer: "Great answer! Can you also explain why you cached the resulting DataFrame in Step 5?"
Candidate: "Caching the DataFrame avoids recomputing the entire dataset when saving the cleaned data, which can significantly improve performance."
Interviewer: "Excellent! You have demonstrated a clear understanding of optimizing duplicate removal in PySpark."