Important Data Engineering Concepts for Interviews
1. ETL Processes: Understand the ETL (Extract, Transform, Load) process, including how to design and implement efficient pipelines to move data from various sources to a data warehouse or data lake. Familiarize yourself with tools like Apache NiFi, Talend, and AWS Glue.
2. Data Warehousing: Know the fundamentals of data warehousing, including the star schema, snowflake schema, and how to design a data warehouse that supports efficient querying and reporting. Learn about popular data warehousing solutions like Amazon Redshift, Google BigQuery, and Snowflake.
3. Data Modeling: Master data modeling concepts, including normalization and denormalization, to design databases that are optimized for both read and write operations. Understand entity-relationship (ER) diagrams and how to use them to model data relationships.
4. Big Data Technologies: Gain expertise in big data frameworks like Apache Hadoop and Apache Spark for processing large datasets. Understand the roles of HDFS, MapReduce, Hive, and Pig in the Hadoop ecosystem, and how Spark’s in-memory processing can accelerate data processing.
5. Data Lakes: Learn about data lakes as a storage solution for raw, unstructured, and semi-structured data. Understand the key differences between data lakes and data warehouses, and how to use tools like Apache Hudi and Delta Lake to manage data lakes efficiently.
6. SQL and NoSQL Databases: Be proficient in SQL for querying and managing relational databases like MySQL, PostgreSQL, and Oracle. Also, understand when and how to use NoSQL databases like MongoDB, Cassandra, and DynamoDB for storing and querying unstructured or semi-structured data.
7. Data Pipelines: Learn how to design, build, and manage data pipelines that automate the flow of data from source systems to target destinations. Familiarize yourself with orchestration tools like Apache Airflow, Luigi, and Prefect for managing complex workflows.
8. APIs and Data Integration: Understand how to integrate data from various APIs and third-party services into your data pipelines. Learn about RESTful APIs, GraphQL, and how to handle data ingestion from external sources securely and efficiently.
9. Data Streaming: Gain knowledge of real-time data processing using streaming technologies like Apache Kafka, Apache Flink, and Amazon Kinesis. Learn how to build systems that can process and analyze data in real time as it flows through the system.
10. Cloud Platforms: Get familiar with cloud-based data engineering services offered by AWS, Azure, and Google Cloud. Understand how to use services like AWS S3, Azure Data Lake, Google Cloud Storage, AWS Redshift, and BigQuery for data storage, processing, and analysis.
11. Data Governance and Security: Learn best practices for data governance, including how to implement data quality checks, lineage tracking, and metadata management. Understand data security concepts like encryption, access control, and GDPR compliance to protect sensitive data.
12. Automation and Scripting: Be proficient in scripting languages like Python, Bash, or PowerShell to automate repetitive tasks, manage data pipelines, and perform ad-hoc data processing.
13. Data Versioning and Lineage: Understand the importance of data versioning and lineage for tracking changes to data over time. Learn how to use tools like Apache Atlas or DataHub for managing metadata and ensuring traceability in your data pipelines.
14. Containerization and Orchestration: Learn how to deploy and manage data engineering workloads using containerization tools like Docker and orchestration platforms like Kubernetes. Understand the benefits of using containers for scaling and maintaining consistency across environments.
15. Monitoring and Logging: Implement logging for data pipelines to ensure they run smoothly and efficiently. Familiarize yourself with tools like Prometheus, Grafana, etc. for real-time monitoring and troubleshooting.
1. ETL Processes: Understand the ETL (Extract, Transform, Load) process, including how to design and implement efficient pipelines to move data from various sources to a data warehouse or data lake. Familiarize yourself with tools like Apache NiFi, Talend, and AWS Glue.
2. Data Warehousing: Know the fundamentals of data warehousing, including the star schema, snowflake schema, and how to design a data warehouse that supports efficient querying and reporting. Learn about popular data warehousing solutions like Amazon Redshift, Google BigQuery, and Snowflake.
3. Data Modeling: Master data modeling concepts, including normalization and denormalization, to design databases that are optimized for both read and write operations. Understand entity-relationship (ER) diagrams and how to use them to model data relationships.
4. Big Data Technologies: Gain expertise in big data frameworks like Apache Hadoop and Apache Spark for processing large datasets. Understand the roles of HDFS, MapReduce, Hive, and Pig in the Hadoop ecosystem, and how Spark’s in-memory processing can accelerate data processing.
5. Data Lakes: Learn about data lakes as a storage solution for raw, unstructured, and semi-structured data. Understand the key differences between data lakes and data warehouses, and how to use tools like Apache Hudi and Delta Lake to manage data lakes efficiently.
6. SQL and NoSQL Databases: Be proficient in SQL for querying and managing relational databases like MySQL, PostgreSQL, and Oracle. Also, understand when and how to use NoSQL databases like MongoDB, Cassandra, and DynamoDB for storing and querying unstructured or semi-structured data.
7. Data Pipelines: Learn how to design, build, and manage data pipelines that automate the flow of data from source systems to target destinations. Familiarize yourself with orchestration tools like Apache Airflow, Luigi, and Prefect for managing complex workflows.
8. APIs and Data Integration: Understand how to integrate data from various APIs and third-party services into your data pipelines. Learn about RESTful APIs, GraphQL, and how to handle data ingestion from external sources securely and efficiently.
9. Data Streaming: Gain knowledge of real-time data processing using streaming technologies like Apache Kafka, Apache Flink, and Amazon Kinesis. Learn how to build systems that can process and analyze data in real time as it flows through the system.
10. Cloud Platforms: Get familiar with cloud-based data engineering services offered by AWS, Azure, and Google Cloud. Understand how to use services like AWS S3, Azure Data Lake, Google Cloud Storage, AWS Redshift, and BigQuery for data storage, processing, and analysis.
11. Data Governance and Security: Learn best practices for data governance, including how to implement data quality checks, lineage tracking, and metadata management. Understand data security concepts like encryption, access control, and GDPR compliance to protect sensitive data.
12. Automation and Scripting: Be proficient in scripting languages like Python, Bash, or PowerShell to automate repetitive tasks, manage data pipelines, and perform ad-hoc data processing.
13. Data Versioning and Lineage: Understand the importance of data versioning and lineage for tracking changes to data over time. Learn how to use tools like Apache Atlas or DataHub for managing metadata and ensuring traceability in your data pipelines.
14. Containerization and Orchestration: Learn how to deploy and manage data engineering workloads using containerization tools like Docker and orchestration platforms like Kubernetes. Understand the benefits of using containers for scaling and maintaining consistency across environments.
15. Monitoring and Logging: Implement logging for data pipelines to ensure they run smoothly and efficiently. Familiarize yourself with tools like Prometheus, Grafana, etc. for real-time monitoring and troubleshooting.
Pyspark Interview Questions!!
Interviewer: "How would you remove duplicates from a large dataset in PySpark?"
Candidate: "To remove duplicates from a large dataset in PySpark, I would follow these steps:
Step 1: Load the dataset into a DataFrame
Step 2: Check for duplicates
Step 3: Partition the data to optimize performance
Step 4: Remove duplicates using the
Step 5: Cache the resulting DataFrame to avoid recomputing
Step 6: Save the cleaned dataset
Interviewer: "That's correct! Can you explain why you partitioned the data in Step 3?"
Candidate: "Yes, partitioning the data helps to distribute the computation across multiple nodes, making the process more efficient and scalable."
Interviewer: "Great answer! Can you also explain why you cached the resulting DataFrame in Step 5?"
Candidate: "Caching the DataFrame avoids recomputing the entire dataset when saving the cleaned data, which can significantly improve performance."
Interviewer: "Excellent! You have demonstrated a clear understanding of optimizing duplicate removal in PySpark."
Interviewer: "How would you remove duplicates from a large dataset in PySpark?"
Candidate: "To remove duplicates from a large dataset in PySpark, I would follow these steps:
Step 1: Load the dataset into a DataFrame
df = spark.read.csv("path/to/data.csv", header=True, inferSchema=True)Step 2: Check for duplicates
duplicate_count = df.count() - df.dropDuplicates().count()
print(f"Number of duplicates: {duplicate_count}")
Step 3: Partition the data to optimize performance
df_repartitioned = df.repartition(100)Step 4: Remove duplicates using the
dropDuplicates() methoddf_no_duplicates = df_repartitioned.dropDuplicates()Step 5: Cache the resulting DataFrame to avoid recomputing
df_no_duplicates.cache()Step 6: Save the cleaned dataset
df_no_duplicates.write.csv("path/to/cleaned/data.csv", header=True)Interviewer: "That's correct! Can you explain why you partitioned the data in Step 3?"
Candidate: "Yes, partitioning the data helps to distribute the computation across multiple nodes, making the process more efficient and scalable."
Interviewer: "Great answer! Can you also explain why you cached the resulting DataFrame in Step 5?"
Candidate: "Caching the DataFrame avoids recomputing the entire dataset when saving the cleaned data, which can significantly improve performance."
Interviewer: "Excellent! You have demonstrated a clear understanding of optimizing duplicate removal in PySpark."
Date: 18-03-2025
Company name: TCS
Role: Data Scientist
01. What is slicing in Python?
Answer- Slicing is used to access parts of sequences like lists, tuples, and strings. The syntax of slicing is-[start:end:step]. The step can be omitted as well. When we write [start:end] this returns all the elements of the sequence from the start (inclusive) till the end-1 element. If the start or end element is negative i, it means the ith element from the end. The step indicates the jump or how many elements have to be skipped.
02. What do you mean by ensemble learning?
Answer- Numerous models, such as classifiers are strategically made and combined to solve a specific computational program which is known as ensemble learning. The ensemble methods are also known as committee-based learning or learning multiple classifier systems. One of the most suitable examples of ensemble modeling is the random forest trees where several decision trees are used to predict outcomes.
03. Why is the KNN Algorithm known as Lazy Learner?
Answer- When the KNN algorithm gets the training data, it does not learn and make a model, it just stores the data. Instead of finding any discriminative function with the help of the training data, it follows instance-based learning and also uses the training data when it actually needs to do some prediction on the unseen datasets. As a result, KNN does not immediately learn a model rather delays the learning thereby being referred to as Lazy Learner.
04. What does this mean: *args, **kwargs in Python? And why would we use it?
Answer- We use *args when we aren’t sure how many arguments are going to be passed to a function, or if we want to pass a stored list or tuple of arguments to a function. **kwargs is used when we don’t know how many keyword arguments will be passed to a function, or it can be used to pass the values of a dictionary as keyword arguments.
————————————————————
Stay Safe & Happy Learning 💙
Company name: TCS
Role: Data Scientist
01. What is slicing in Python?
Answer- Slicing is used to access parts of sequences like lists, tuples, and strings. The syntax of slicing is-[start:end:step]. The step can be omitted as well. When we write [start:end] this returns all the elements of the sequence from the start (inclusive) till the end-1 element. If the start or end element is negative i, it means the ith element from the end. The step indicates the jump or how many elements have to be skipped.
02. What do you mean by ensemble learning?
Answer- Numerous models, such as classifiers are strategically made and combined to solve a specific computational program which is known as ensemble learning. The ensemble methods are also known as committee-based learning or learning multiple classifier systems. One of the most suitable examples of ensemble modeling is the random forest trees where several decision trees are used to predict outcomes.
03. Why is the KNN Algorithm known as Lazy Learner?
Answer- When the KNN algorithm gets the training data, it does not learn and make a model, it just stores the data. Instead of finding any discriminative function with the help of the training data, it follows instance-based learning and also uses the training data when it actually needs to do some prediction on the unseen datasets. As a result, KNN does not immediately learn a model rather delays the learning thereby being referred to as Lazy Learner.
04. What does this mean: *args, **kwargs in Python? And why would we use it?
Answer- We use *args when we aren’t sure how many arguments are going to be passed to a function, or if we want to pass a stored list or tuple of arguments to a function. **kwargs is used when we don’t know how many keyword arguments will be passed to a function, or it can be used to pass the values of a dictionary as keyword arguments.
————————————————————
Stay Safe & Happy Learning 💙
🚀 CI/CD & Deployment Strategies
✅ How do you set up end-to-end CI/CD automation for a microservices application?
✅ What are the key differences between self-hosted CI/CD solutions (e.g., Jenkins) and cloud-managed CI/CD services (e.g., GitHub Actions, GitLab CI/CD)?
✅ How do you implement progressive delivery using feature flags and canary deployments?
✅ What challenges do you face in deploying CI/CD pipelines across multi-cloud environments, and how do you solve them?
🚀 Infrastructure as Code (IaC) & Configuration Management
✅ How do you ensure idempotency in configuration management tools like Ansible or Chef?
✅ What are the best practices for state file management in Terraform?
✅ How do you structure Terraform modules for scalability and reusability?
✅ How would you use Ansible to dynamically configure infrastructure based on cloud metadata?
🚀 GitOps & Source Control
✅ How do you implement drift detection and auto-remediation in a GitOps workflow?
✅ What is the role of Helm vs. Kustomize in Kubernetes GitOps?
✅ How do you secure Git repositories and CI/CD pipelines to prevent supply chain attacks?
✅ Explain how signed commits and commit verification improve security in Git workflows.
🚀 Cloud Security & Compliance
✅ How do you implement Just-in-Time (JIT) access controls in a cloud environment?
✅ What are the differences between IAM roles, policies, and service accounts, and how do you use them effectively?
✅ How do you implement network segmentation and firewall rules in Kubernetes for security hardening?
✅ What security best practices do you follow when running serverless functions (AWS Lambda, Azure Functions, GCP Cloud Run)?
🚀 Scripting & Automation
✅ Write a Bash script to monitor disk usage and send alerts if usage exceeds 80%.
✅ How would you automate the cleanup of unused Docker images and volumes on a Kubernetes cluster?
✅ Explain how to use Python and AWS SDK (Boto3) to automate EC2 instance creation and tagging.
✅ How do you set up scheduled automation jobs in Kubernetes using CronJobs?
This interview round tested a deep understanding of automation, cloud infrastructure, CI/CD pipelines, and security best practices.
✅ How do you set up end-to-end CI/CD automation for a microservices application?
✅ What are the key differences between self-hosted CI/CD solutions (e.g., Jenkins) and cloud-managed CI/CD services (e.g., GitHub Actions, GitLab CI/CD)?
✅ How do you implement progressive delivery using feature flags and canary deployments?
✅ What challenges do you face in deploying CI/CD pipelines across multi-cloud environments, and how do you solve them?
🚀 Infrastructure as Code (IaC) & Configuration Management
✅ How do you ensure idempotency in configuration management tools like Ansible or Chef?
✅ What are the best practices for state file management in Terraform?
✅ How do you structure Terraform modules for scalability and reusability?
✅ How would you use Ansible to dynamically configure infrastructure based on cloud metadata?
🚀 GitOps & Source Control
✅ How do you implement drift detection and auto-remediation in a GitOps workflow?
✅ What is the role of Helm vs. Kustomize in Kubernetes GitOps?
✅ How do you secure Git repositories and CI/CD pipelines to prevent supply chain attacks?
✅ Explain how signed commits and commit verification improve security in Git workflows.
🚀 Cloud Security & Compliance
✅ How do you implement Just-in-Time (JIT) access controls in a cloud environment?
✅ What are the differences between IAM roles, policies, and service accounts, and how do you use them effectively?
✅ How do you implement network segmentation and firewall rules in Kubernetes for security hardening?
✅ What security best practices do you follow when running serverless functions (AWS Lambda, Azure Functions, GCP Cloud Run)?
🚀 Scripting & Automation
✅ Write a Bash script to monitor disk usage and send alerts if usage exceeds 80%.
✅ How would you automate the cleanup of unused Docker images and volumes on a Kubernetes cluster?
✅ Explain how to use Python and AWS SDK (Boto3) to automate EC2 instance creation and tagging.
✅ How do you set up scheduled automation jobs in Kubernetes using CronJobs?
This interview round tested a deep understanding of automation, cloud infrastructure, CI/CD pipelines, and security best practices.
👍1
Date: 02-04-2025
Company name: Accenture
Role: Data Scientist
01. What is a UNIQUE constraint?
Answer- The UNIQUE Constraint prevents identical values in a column from appearing in two records. The UNIQUE constraint guarantees that every value in a column is unique.
02.What is schema in SQL Server?
Answer- A schema is a visual representation of the database that is logical. It builds and specifies the relationships among the database’s numerous entities. It refers to the several kinds of constraints that may be applied to a database. It also describes the various data kinds. It may also be used on Tables and Views. Schemas come in a variety of shapes and sizes. Star schema and Snowflake schema are two of the most popular. The entities in a star schema are represented in a star form, whereas those in a snowflake schema are shown in a snowflake shape. Any database architecture is built on the foundation of schemas.
03. What constraints come under the primary key constraint?
Answer- The PRIMARY KEY constraint uniquely identifies each record in a table. Primary keys must contain UNIQUE values, and cannot contain NULL values. A table can have only ONE primary key; and in the table, this primary key can consist of single or multiple columns (fields). Hence the unique constraint and the not null constraint come under the primary key by default.
04. Explain Grant and Revoke in SQL.
Answer- SQL Grant command is specifically used to provide privileges to database objects for a user. This command also allows users to grant permissions to other users too. Revoke command withdraw user privileges on database objects if any granted. It does operations opposite to the Grant command. When a privilege is revoked from a particular user U, then the privileges granted to all other users by user U will be revoked.
————————————————————
Stay Safe & Happy Learning 💙
Company name: Accenture
Role: Data Scientist
01. What is a UNIQUE constraint?
Answer- The UNIQUE Constraint prevents identical values in a column from appearing in two records. The UNIQUE constraint guarantees that every value in a column is unique.
02.What is schema in SQL Server?
Answer- A schema is a visual representation of the database that is logical. It builds and specifies the relationships among the database’s numerous entities. It refers to the several kinds of constraints that may be applied to a database. It also describes the various data kinds. It may also be used on Tables and Views. Schemas come in a variety of shapes and sizes. Star schema and Snowflake schema are two of the most popular. The entities in a star schema are represented in a star form, whereas those in a snowflake schema are shown in a snowflake shape. Any database architecture is built on the foundation of schemas.
03. What constraints come under the primary key constraint?
Answer- The PRIMARY KEY constraint uniquely identifies each record in a table. Primary keys must contain UNIQUE values, and cannot contain NULL values. A table can have only ONE primary key; and in the table, this primary key can consist of single or multiple columns (fields). Hence the unique constraint and the not null constraint come under the primary key by default.
04. Explain Grant and Revoke in SQL.
Answer- SQL Grant command is specifically used to provide privileges to database objects for a user. This command also allows users to grant permissions to other users too. Revoke command withdraw user privileges on database objects if any granted. It does operations opposite to the Grant command. When a privilege is revoked from a particular user U, then the privileges granted to all other users by user U will be revoked.
————————————————————
Stay Safe & Happy Learning 💙
SNOWFLAKES AND DATABRICKS
Snowflake and Databricks are leading cloud data platforms, but how do you choose the right one for your needs?
🌐 𝐒𝐧𝐨𝐰𝐟𝐥𝐚𝐤𝐞
❄️ 𝐍𝐚𝐭𝐮𝐫𝐞: Snowflake operates as a cloud-native data warehouse-as-a-service, streamlining data storage and management without the need for complex infrastructure setup.
❄️ 𝐒𝐭𝐫𝐞𝐧𝐠𝐭𝐡𝐬: It provides robust ELT (Extract, Load, Transform) capabilities primarily through its COPY command, enabling efficient data loading.
❄️ Snowflake offers dedicated schema and file object definitions, enhancing data organization and accessibility.
❄️ 𝐅𝐥𝐞𝐱𝐢𝐛𝐢𝐥𝐢𝐭𝐲: One of its standout features is the ability to create multiple independent compute clusters that can operate on a single data copy. This flexibility allows for enhanced resource allocation based on varying workloads.
❄️ 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠: While Snowflake primarily adopts an ELT approach, it seamlessly integrates with popular third-party ETL tools such as Fivetran, Talend, and supports DBT installation. This integration makes it a versatile choice for organizations looking to leverage existing tools.
🌐 𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬
❄️ 𝐂𝐨𝐫𝐞: Databricks is fundamentally built around processing power, with native support for Apache Spark, making it an exceptional platform for ETL tasks. This integration allows users to perform complex data transformations efficiently.
❄️ 𝐒𝐭𝐨𝐫𝐚𝐠𝐞: It utilizes a 'data lakehouse' architecture, which combines the features of a data lake with the ability to run SQL queries. This model is gaining traction as organizations seek to leverage both structured and unstructured data in a unified framework.
🌐 𝐊𝐞𝐲 𝐓𝐚𝐤𝐞𝐚𝐰𝐚𝐲𝐬
❄️ 𝐃𝐢𝐬𝐭𝐢𝐧𝐜𝐭 𝐍𝐞𝐞𝐝𝐬: Both Snowflake and Databricks excel in their respective areas, addressing different data management requirements.
❄️ 𝐒𝐧𝐨𝐰𝐟𝐥𝐚𝐤𝐞’𝐬 𝐈𝐝𝐞𝐚𝐥 𝐔𝐬𝐞 𝐂𝐚𝐬𝐞: If you are equipped with established ETL tools like Fivetran, Talend, or Tibco, Snowflake could be the perfect choice. It efficiently manages the complexities of database infrastructure, including partitioning, scalability, and indexing.
❄️ 𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬 𝐟𝐨𝐫 𝐂𝐨𝐦𝐩𝐥𝐞𝐱 𝐋𝐚𝐧𝐝𝐬𝐜𝐚𝐩𝐞𝐬: Conversely, if your organization deals with a complex data landscape characterized by unpredictable sources and schemas, Databricks—with its schema-on-read technique—may be more advantageous.
🌐 𝐂𝐨𝐧𝐜𝐥𝐮𝐬𝐢𝐨𝐧:
Ultimately, the decision between Snowflake and Databricks should align with your specific data needs and organizational goals. Both platforms have established their niches, and understanding their strengths will guide you in selecting the right tool for your data strategy.
Snowflake and Databricks are leading cloud data platforms, but how do you choose the right one for your needs?
🌐 𝐒𝐧𝐨𝐰𝐟𝐥𝐚𝐤𝐞
❄️ 𝐍𝐚𝐭𝐮𝐫𝐞: Snowflake operates as a cloud-native data warehouse-as-a-service, streamlining data storage and management without the need for complex infrastructure setup.
❄️ 𝐒𝐭𝐫𝐞𝐧𝐠𝐭𝐡𝐬: It provides robust ELT (Extract, Load, Transform) capabilities primarily through its COPY command, enabling efficient data loading.
❄️ Snowflake offers dedicated schema and file object definitions, enhancing data organization and accessibility.
❄️ 𝐅𝐥𝐞𝐱𝐢𝐛𝐢𝐥𝐢𝐭𝐲: One of its standout features is the ability to create multiple independent compute clusters that can operate on a single data copy. This flexibility allows for enhanced resource allocation based on varying workloads.
❄️ 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠: While Snowflake primarily adopts an ELT approach, it seamlessly integrates with popular third-party ETL tools such as Fivetran, Talend, and supports DBT installation. This integration makes it a versatile choice for organizations looking to leverage existing tools.
🌐 𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬
❄️ 𝐂𝐨𝐫𝐞: Databricks is fundamentally built around processing power, with native support for Apache Spark, making it an exceptional platform for ETL tasks. This integration allows users to perform complex data transformations efficiently.
❄️ 𝐒𝐭𝐨𝐫𝐚𝐠𝐞: It utilizes a 'data lakehouse' architecture, which combines the features of a data lake with the ability to run SQL queries. This model is gaining traction as organizations seek to leverage both structured and unstructured data in a unified framework.
🌐 𝐊𝐞𝐲 𝐓𝐚𝐤𝐞𝐚𝐰𝐚𝐲𝐬
❄️ 𝐃𝐢𝐬𝐭𝐢𝐧𝐜𝐭 𝐍𝐞𝐞𝐝𝐬: Both Snowflake and Databricks excel in their respective areas, addressing different data management requirements.
❄️ 𝐒𝐧𝐨𝐰𝐟𝐥𝐚𝐤𝐞’𝐬 𝐈𝐝𝐞𝐚𝐥 𝐔𝐬𝐞 𝐂𝐚𝐬𝐞: If you are equipped with established ETL tools like Fivetran, Talend, or Tibco, Snowflake could be the perfect choice. It efficiently manages the complexities of database infrastructure, including partitioning, scalability, and indexing.
❄️ 𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬 𝐟𝐨𝐫 𝐂𝐨𝐦𝐩𝐥𝐞𝐱 𝐋𝐚𝐧𝐝𝐬𝐜𝐚𝐩𝐞𝐬: Conversely, if your organization deals with a complex data landscape characterized by unpredictable sources and schemas, Databricks—with its schema-on-read technique—may be more advantageous.
🌐 𝐂𝐨𝐧𝐜𝐥𝐮𝐬𝐢𝐨𝐧:
Ultimately, the decision between Snowflake and Databricks should align with your specific data needs and organizational goals. Both platforms have established their niches, and understanding their strengths will guide you in selecting the right tool for your data strategy.
Most asked Python interview questions for Data Engineer jobs with answers!
𝟭. 𝗘𝘅𝗽𝗹𝗮𝗶𝗻 𝘁𝗵𝗲 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗯𝗲𝘁𝘄𝗲𝗲𝗻 𝗹𝗶𝘀𝘁𝘀 𝗮𝗻𝗱 𝘁𝘂𝗽𝗹𝗲𝘀 𝗶𝗻 𝗣𝘆𝘁𝗵𝗼𝗻.
Lists are mutable, meaning their elements can be changed but Tuples are immutable.
𝟮. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗮 𝗗𝗮𝘁𝗮𝗙𝗿𝗮𝗺𝗲 𝗶𝗻 𝗽𝗮𝗻𝗱𝗮𝘀?
A DataFrame is a 2-dimensional labelled data structure, similar to a spreadsheet.
𝟯. 𝗥𝗲𝘃𝗲𝗿𝘀𝗶𝗻𝗴 𝘁𝗵𝗲 𝘄𝗼𝗿𝗱𝘀 𝗶𝗻 𝗮 𝘀𝘁𝗿𝗶𝗻𝗴 𝗶𝗻 𝗣𝘆𝘁𝗵𝗼𝗻
def reverse_words(s: str) -> str:
words = s.split()
reversed_words = reversed(words)
return ' '.join(reversed_words)
𝟰. 𝗪𝗿𝗶𝘁𝗲 𝗮 𝗣𝘆𝘁𝗵𝗼𝗻 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻 𝘁𝗼 𝗰𝗼𝘂𝗻𝘁 𝘁𝗵𝗲 𝗻𝘂𝗺𝗯𝗲𝗿 𝗼𝗳 𝘃𝗼𝘄𝗲𝗹𝘀 𝗶𝗻 𝗮 𝗴𝗶𝘃𝗲𝗻 𝘀𝘁𝗿𝗶𝗻𝗴?
def count_vowels(string: str) -> int:
vowels = "aeiouAEIOU"
vowel_count = 0
for char in string:
if char in vowels:
vowel_count += 1
return vowel_count
I’ve listed 4 but there are many questions you’d need to prepare to succeed in interviews.
All the best 👍👍
𝟭. 𝗘𝘅𝗽𝗹𝗮𝗶𝗻 𝘁𝗵𝗲 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗯𝗲𝘁𝘄𝗲𝗲𝗻 𝗹𝗶𝘀𝘁𝘀 𝗮𝗻𝗱 𝘁𝘂𝗽𝗹𝗲𝘀 𝗶𝗻 𝗣𝘆𝘁𝗵𝗼𝗻.
Lists are mutable, meaning their elements can be changed but Tuples are immutable.
𝟮. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗮 𝗗𝗮𝘁𝗮𝗙𝗿𝗮𝗺𝗲 𝗶𝗻 𝗽𝗮𝗻𝗱𝗮𝘀?
A DataFrame is a 2-dimensional labelled data structure, similar to a spreadsheet.
𝟯. 𝗥𝗲𝘃𝗲𝗿𝘀𝗶𝗻𝗴 𝘁𝗵𝗲 𝘄𝗼𝗿𝗱𝘀 𝗶𝗻 𝗮 𝘀𝘁𝗿𝗶𝗻𝗴 𝗶𝗻 𝗣𝘆𝘁𝗵𝗼𝗻
def reverse_words(s: str) -> str:
words = s.split()
reversed_words = reversed(words)
return ' '.join(reversed_words)
𝟰. 𝗪𝗿𝗶𝘁𝗲 𝗮 𝗣𝘆𝘁𝗵𝗼𝗻 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻 𝘁𝗼 𝗰𝗼𝘂𝗻𝘁 𝘁𝗵𝗲 𝗻𝘂𝗺𝗯𝗲𝗿 𝗼𝗳 𝘃𝗼𝘄𝗲𝗹𝘀 𝗶𝗻 𝗮 𝗴𝗶𝘃𝗲𝗻 𝘀𝘁𝗿𝗶𝗻𝗴?
def count_vowels(string: str) -> int:
vowels = "aeiouAEIOU"
vowel_count = 0
for char in string:
if char in vowels:
vowel_count += 1
return vowel_count
I’ve listed 4 but there are many questions you’d need to prepare to succeed in interviews.
All the best 👍👍
Here are top 40 commonly asked pyspark questions that you can prepare for interviews.
𝗥𝗗𝗗𝘀 -
1. What is an RDD in Apache Spark? Explain its characteristics.
2. How are RDDs fault-tolerant in Apache Spark?
3. What are the different ways to create RDDs in Spark?
4. Explain the difference between transformations and actions in RDDs.
5. How does Spark handle data partitioning in RDDs?
6. Can you explain the lineage graph in RDDs and its significance?
7. What is lazy evaluation in Apache Spark RDDs?
8. How can you persist RDDs in memory for faster access?
9. Explain the concept of narrow and wide transformations in RDDs.
10. What are the limitations of RDDs compared to DataFrames and Datasets?
𝗗𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲 𝗮𝗻𝗱 𝗗𝗮𝘁𝗮𝘀𝗲𝘁𝘀 -
1. What are DataFrames and Datasets in Apache Spark?
2. What are the differences between DataFrame and RDD?
3. Explain the concept of a schema in a DataFrame.
4. How are DataFrames and Datasets fault-tolerant in Spark?
5. What are the advantages of using DataFrames over RDDs?
6. Explain the Catalyst optimizer in Apache Spark.
7. How can you create DataFrames in Apache Spark?
8. What is the significance of Encoders in Datasets?
9. How does Spark SQL optimize the execution plan for DataFrames?
10. Can you explain the benefits of using Datasets over DataFrames?
𝗦𝗽𝗮𝗿𝗸 𝗦𝗤𝗟 -
1. What is Spark SQL, and how does it relate to Apache Spark?
2. How does Spark SQL leverage DataFrame and Dataset APIs?
3. Explain the role of the Catalyst optimizer in Spark SQL.
4. How can you run SQL queries on DataFrames in Spark SQL?
5. What are the benefits of using Spark SQL over traditional SQL queries?
𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 -
1. What are some common performance bottlenecks in Apache Spark applications?
2. How can you optimize the shuffle operations in Spark?
3. Explain the significance of data skew and techniques to handle it in Spark.
4. What are some techniques to optimize Spark job execution time?
5. How can you tune memory configurations for better performance in Spark?
6. What is dynamic allocation, and how does it optimize resource usage in Spark?
7. How can you optimize joins in Spark?
8. What are the benefits of partitioning data in Spark?
9. How does Spark leverage data locality for optimization?
All the best 👍👍
𝗥𝗗𝗗𝘀 -
1. What is an RDD in Apache Spark? Explain its characteristics.
2. How are RDDs fault-tolerant in Apache Spark?
3. What are the different ways to create RDDs in Spark?
4. Explain the difference between transformations and actions in RDDs.
5. How does Spark handle data partitioning in RDDs?
6. Can you explain the lineage graph in RDDs and its significance?
7. What is lazy evaluation in Apache Spark RDDs?
8. How can you persist RDDs in memory for faster access?
9. Explain the concept of narrow and wide transformations in RDDs.
10. What are the limitations of RDDs compared to DataFrames and Datasets?
𝗗𝗮𝘁𝗮𝗳𝗿𝗮𝗺𝗲 𝗮𝗻𝗱 𝗗𝗮𝘁𝗮𝘀𝗲𝘁𝘀 -
1. What are DataFrames and Datasets in Apache Spark?
2. What are the differences between DataFrame and RDD?
3. Explain the concept of a schema in a DataFrame.
4. How are DataFrames and Datasets fault-tolerant in Spark?
5. What are the advantages of using DataFrames over RDDs?
6. Explain the Catalyst optimizer in Apache Spark.
7. How can you create DataFrames in Apache Spark?
8. What is the significance of Encoders in Datasets?
9. How does Spark SQL optimize the execution plan for DataFrames?
10. Can you explain the benefits of using Datasets over DataFrames?
𝗦𝗽𝗮𝗿𝗸 𝗦𝗤𝗟 -
1. What is Spark SQL, and how does it relate to Apache Spark?
2. How does Spark SQL leverage DataFrame and Dataset APIs?
3. Explain the role of the Catalyst optimizer in Spark SQL.
4. How can you run SQL queries on DataFrames in Spark SQL?
5. What are the benefits of using Spark SQL over traditional SQL queries?
𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 -
1. What are some common performance bottlenecks in Apache Spark applications?
2. How can you optimize the shuffle operations in Spark?
3. Explain the significance of data skew and techniques to handle it in Spark.
4. What are some techniques to optimize Spark job execution time?
5. How can you tune memory configurations for better performance in Spark?
6. What is dynamic allocation, and how does it optimize resource usage in Spark?
7. How can you optimize joins in Spark?
8. What are the benefits of partitioning data in Spark?
9. How does Spark leverage data locality for optimization?
All the best 👍👍
Date: 09-04-2025
Company name: Capgemini
Role: Data Scientist
01. What are decorators in Python?
Answer- Decorators are used to add some design patterns to a function without changing its structure. Decorators generally are defined before the function they are enhancing. To apply a decorator we first define the decorator function. Then we write the function it is applied to and simply add the decorator function above the function it has to be applied to. For this, we use the @ symbol before the decorator.
02. What is the ACID property in a database?
Answer- The full form of ACID is atomicity, consistency, isolation, and durability.
•Atomicity refers to the fact that if any aspect of a transaction fails, the whole transaction fails and the database state remains unchanged.
•Consistency means that the data meets all validity guidelines.
•Concurrency management is the primary objective of isolation.
•Durability ensures that once a transaction is committed, it will occur regardless of what happens in between such as a power outage, fire, or some other kind of disturbance.
03. Explain One-hot encoding and Label Encoding. How do they affect the dimensionality of the given dataset?
Answer- One-hot encoding is the representation of categorical variables as binary vectors. Label Encoding is converting labels/words into numeric form. Using one-hot encoding increases the dimensionality of the data set. Label encoding doesn’t affect the dimensionality of the data set. One-hot encoding creates a new variable for each level in the variable whereas, in Label encoding, the levels of a variable get encoded as 1 and 0.
04. Given a table Employee having columns empName and empId, what will be the result of the SQL query below?
Answer- Select empName from Employee order by 2 asc;
“Order by 2” is valid when there are at least 2 columns used in SELECT statement. Here this query will throw error because only one column is used in the SELECT statement.
————————————————————
Stay Safe & Happy Learning 💙
Company name: Capgemini
Role: Data Scientist
01. What are decorators in Python?
Answer- Decorators are used to add some design patterns to a function without changing its structure. Decorators generally are defined before the function they are enhancing. To apply a decorator we first define the decorator function. Then we write the function it is applied to and simply add the decorator function above the function it has to be applied to. For this, we use the @ symbol before the decorator.
02. What is the ACID property in a database?
Answer- The full form of ACID is atomicity, consistency, isolation, and durability.
•Atomicity refers to the fact that if any aspect of a transaction fails, the whole transaction fails and the database state remains unchanged.
•Consistency means that the data meets all validity guidelines.
•Concurrency management is the primary objective of isolation.
•Durability ensures that once a transaction is committed, it will occur regardless of what happens in between such as a power outage, fire, or some other kind of disturbance.
03. Explain One-hot encoding and Label Encoding. How do they affect the dimensionality of the given dataset?
Answer- One-hot encoding is the representation of categorical variables as binary vectors. Label Encoding is converting labels/words into numeric form. Using one-hot encoding increases the dimensionality of the data set. Label encoding doesn’t affect the dimensionality of the data set. One-hot encoding creates a new variable for each level in the variable whereas, in Label encoding, the levels of a variable get encoded as 1 and 0.
04. Given a table Employee having columns empName and empId, what will be the result of the SQL query below?
Answer- Select empName from Employee order by 2 asc;
“Order by 2” is valid when there are at least 2 columns used in SELECT statement. Here this query will throw error because only one column is used in the SELECT statement.
————————————————————
Stay Safe & Happy Learning 💙
Planning for Data Engineering Interview.
Focus on SQL & Python first. Here are some important questions which you should know.
𝐈𝐦𝐩𝐨𝐫𝐭𝐚𝐧𝐭 𝐒𝐐𝐋 𝐪𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬
1- Find out nth Order/Salary from the tables.
2- Find the no of output records in each join from given Table 1 & Table 2
3- YOY,MOM Growth related questions.
4- Find out Employee ,Manager Hierarchy (Self join related question) or
Employees who are earning more than managers.
5- RANK,DENSERANK related questions
6- Some row level scanning medium to complex questions using CTE or recursive CTE, like (Missing no /Missing Item from the list etc.)
7- No of matches played by every team or Source to Destination flight combination using CROSS JOIN.
8-Use window functions to perform advanced analytical tasks, such as calculating moving averages or detecting outliers.
9- Implement logic to handle hierarchical data, such as finding all descendants of a given node in a tree structure.
10-Identify and remove duplicate records from a table.
𝐈𝐦𝐩𝐨𝐫𝐭𝐚𝐧𝐭 𝐏𝐲𝐭𝐡𝐨𝐧 𝐪𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬
1- Reversing a String using an Extended Slicing techniques.
2- Count Vowels from Given words .
3- Find the highest occurrences of each word from string and sort them in order.
4- Remove Duplicates from List.
5-Sort a List without using Sort keyword.
6-Find the pair of numbers in this list whose sum is n no.
7-Find the max and min no in the list without using inbuilt functions.
8-Calculate the Intersection of Two Lists without using Built-in Functions
9-Write Python code to make API requests to a public API (e.g., weather API) and process the JSON response.
10-Implement a function to fetch data from a database table, perform data manipulation, and update the database.
All the best 👍👍
Focus on SQL & Python first. Here are some important questions which you should know.
𝐈𝐦𝐩𝐨𝐫𝐭𝐚𝐧𝐭 𝐒𝐐𝐋 𝐪𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬
1- Find out nth Order/Salary from the tables.
2- Find the no of output records in each join from given Table 1 & Table 2
3- YOY,MOM Growth related questions.
4- Find out Employee ,Manager Hierarchy (Self join related question) or
Employees who are earning more than managers.
5- RANK,DENSERANK related questions
6- Some row level scanning medium to complex questions using CTE or recursive CTE, like (Missing no /Missing Item from the list etc.)
7- No of matches played by every team or Source to Destination flight combination using CROSS JOIN.
8-Use window functions to perform advanced analytical tasks, such as calculating moving averages or detecting outliers.
9- Implement logic to handle hierarchical data, such as finding all descendants of a given node in a tree structure.
10-Identify and remove duplicate records from a table.
𝐈𝐦𝐩𝐨𝐫𝐭𝐚𝐧𝐭 𝐏𝐲𝐭𝐡𝐨𝐧 𝐪𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬
1- Reversing a String using an Extended Slicing techniques.
2- Count Vowels from Given words .
3- Find the highest occurrences of each word from string and sort them in order.
4- Remove Duplicates from List.
5-Sort a List without using Sort keyword.
6-Find the pair of numbers in this list whose sum is n no.
7-Find the max and min no in the list without using inbuilt functions.
8-Calculate the Intersection of Two Lists without using Built-in Functions
9-Write Python code to make API requests to a public API (e.g., weather API) and process the JSON response.
10-Implement a function to fetch data from a database table, perform data manipulation, and update the database.
All the best 👍👍
Date: 12-04-2025
Company name: Flipkart
Role: Data Scientist
01. What are Support Vectors in SVM?
Answer- Support Vector Machine (SVM) is an algorithm that tries to fit a line (or plane or hyperplane) between the different classes that maximizes the distance from the line to the points of the classes.
In this way, it tries to find a robust separation between the classes. The Support Vectors are the points of the edge of the dividing hyperplane.
02. Explain Correlation and Covariance?
Answer- Covariance signifies the direction of the linear relationship between two variables, whereas correlation indicates both the direction and strength of the linear relationship between variables.
03. What is P-value?
Answer- P-values are used to make a decision about a hypothesis test. P-value is the minimum significant level at which you can reject the null hypothesis. The lower the p-value, the more likely you reject the null hypothesis.
04. Does Random Forest need Pruning? Why or why not?
Answer-
(a) Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances.
(b) Random Forest usually does not require pruning because it will not over-fit like a single decision tree. This happens due to the fact that the trees are bootstrapped and that multiple random trees use random features so the individual trees are strong without being correlated with each other.
————————————————————
Stay Safe & Happy Learning 💙
Company name: Flipkart
Role: Data Scientist
01. What are Support Vectors in SVM?
Answer- Support Vector Machine (SVM) is an algorithm that tries to fit a line (or plane or hyperplane) between the different classes that maximizes the distance from the line to the points of the classes.
In this way, it tries to find a robust separation between the classes. The Support Vectors are the points of the edge of the dividing hyperplane.
02. Explain Correlation and Covariance?
Answer- Covariance signifies the direction of the linear relationship between two variables, whereas correlation indicates both the direction and strength of the linear relationship between variables.
03. What is P-value?
Answer- P-values are used to make a decision about a hypothesis test. P-value is the minimum significant level at which you can reject the null hypothesis. The lower the p-value, the more likely you reject the null hypothesis.
04. Does Random Forest need Pruning? Why or why not?
Answer-
(a) Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances.
(b) Random Forest usually does not require pruning because it will not over-fit like a single decision tree. This happens due to the fact that the trees are bootstrapped and that multiple random trees use random features so the individual trees are strong without being correlated with each other.
————————————————————
Stay Safe & Happy Learning 💙
Date: 09-05-2025
Company name: Tredence
Role: Data Scientist
01. What is the exploding gradient problem while using the back propagation technique?
Answer: When large error gradients accumulate and result in large changes in the neural network weights during training, it is called the exploding gradient problem. The values of weights can become so large as to overflow and result in NaN values. This makes the model unstable and the learning of the model to stall just like the vanishing gradient problem. This is one of the most commonly asked interview questions on machine learning.
02. What do you mean by Associative Rule Mining (ARM)?
Answer- Associative Rule Mining is one of the techniques to discover patterns in data like features (dimensions) which occur together and features (dimensions) which are correlated. It is mostly used in Market-based Analysis to find how frequently an itemset occurs in a transaction. Association rules have to satisfy minimum support and minimum confidence at the very same time.
03. What is the primary difference between R square and adjusted R square?
Answer- In linear regression, you use both these values for model validation. However, there is a clear distinction between the two. R square accounts for the variation of all independent variables on the dependent variable. In other words, it considers each independent variable for explaining the variation. In the case of Adjusted R square, it accounts for the significant variables alone for indicating the percentage of variation in the model. By significant, we refer to the P values less than 0.05.
04. What are hard margin and soft Margin SVMs?
Answer- Hard margin SVMs work only if the data is linearly separable and these types of SVMs are quite sensitive to the outliers. But our main objective is to find a good balance between keeping the margins as large as possible and limiting the margin violation i.e. instances that end up in the middle of margin or even on the wrong side, and this method is called soft margin SVM.
————————————————————
Stay Safe & Happy Learning 💙
Company name: Tredence
Role: Data Scientist
01. What is the exploding gradient problem while using the back propagation technique?
Answer: When large error gradients accumulate and result in large changes in the neural network weights during training, it is called the exploding gradient problem. The values of weights can become so large as to overflow and result in NaN values. This makes the model unstable and the learning of the model to stall just like the vanishing gradient problem. This is one of the most commonly asked interview questions on machine learning.
02. What do you mean by Associative Rule Mining (ARM)?
Answer- Associative Rule Mining is one of the techniques to discover patterns in data like features (dimensions) which occur together and features (dimensions) which are correlated. It is mostly used in Market-based Analysis to find how frequently an itemset occurs in a transaction. Association rules have to satisfy minimum support and minimum confidence at the very same time.
03. What is the primary difference between R square and adjusted R square?
Answer- In linear regression, you use both these values for model validation. However, there is a clear distinction between the two. R square accounts for the variation of all independent variables on the dependent variable. In other words, it considers each independent variable for explaining the variation. In the case of Adjusted R square, it accounts for the significant variables alone for indicating the percentage of variation in the model. By significant, we refer to the P values less than 0.05.
04. What are hard margin and soft Margin SVMs?
Answer- Hard margin SVMs work only if the data is linearly separable and these types of SVMs are quite sensitive to the outliers. But our main objective is to find a good balance between keeping the margins as large as possible and limiting the margin violation i.e. instances that end up in the middle of margin or even on the wrong side, and this method is called soft margin SVM.
————————————————————
Stay Safe & Happy Learning 💙
SQL Cheatsheet 📝
This SQL cheatsheet is designed to be your quick reference guide for SQL programming. Whether you’re a beginner learning how to query databases or an experienced developer looking for a handy resource, this cheatsheet covers essential SQL topics.
1. Database Basics
-
-
2. Tables
- Create Table:
- Drop Table:
- Alter Table:
3. Insert Data
-
4. Select Queries
- Basic Select:
- Select Specific Columns:
- Select with Condition:
5. Update Data
-
6. Delete Data
-
7. Joins
- Inner Join:
- Left Join:
- Right Join:
8. Aggregations
- Count:
- Sum:
- Group By:
9. Sorting & Limiting
- Order By:
- Limit Results:
10. Indexes
- Create Index:
- Drop Index:
11. Subqueries
-
12. Views
- Create View:
- Drop View:
Hope it helps :)
This SQL cheatsheet is designed to be your quick reference guide for SQL programming. Whether you’re a beginner learning how to query databases or an experienced developer looking for a handy resource, this cheatsheet covers essential SQL topics.
1. Database Basics
-
CREATE DATABASE db_name;-
USE db_name;2. Tables
- Create Table:
CREATE TABLE table_name (col1 datatype, col2 datatype);- Drop Table:
DROP TABLE table_name;- Alter Table:
ALTER TABLE table_name ADD column_name datatype;3. Insert Data
-
INSERT INTO table_name (col1, col2) VALUES (val1, val2);4. Select Queries
- Basic Select:
SELECT * FROM table_name;- Select Specific Columns:
SELECT col1, col2 FROM table_name;- Select with Condition:
SELECT * FROM table_name WHERE condition;5. Update Data
-
UPDATE table_name SET col1 = value1 WHERE condition;6. Delete Data
-
DELETE FROM table_name WHERE condition;7. Joins
- Inner Join:
SELECT * FROM table1 INNER JOIN table2 ON table1.col = table2.col;- Left Join:
SELECT * FROM table1 LEFT JOIN table2 ON table1.col = table2.col;- Right Join:
SELECT * FROM table1 RIGHT JOIN table2 ON table1.col = table2.col;8. Aggregations
- Count:
SELECT COUNT(*) FROM table_name;- Sum:
SELECT SUM(col) FROM table_name;- Group By:
SELECT col, COUNT(*) FROM table_name GROUP BY col;9. Sorting & Limiting
- Order By:
SELECT * FROM table_name ORDER BY col ASC|DESC;- Limit Results:
SELECT * FROM table_name LIMIT n;10. Indexes
- Create Index:
CREATE INDEX idx_name ON table_name (col);- Drop Index:
DROP INDEX idx_name;11. Subqueries
-
SELECT * FROM table_name WHERE col IN (SELECT col FROM other_table);12. Views
- Create View:
CREATE VIEW view_name AS SELECT * FROM table_name;- Drop View:
DROP VIEW view_name;Hope it helps :)
❤3
CGI Interview Questions
1) Introduce Yourself
2) What is Full Stack Development?
3) Difference between GET and POST method
4) Explain OOPs Concepts with Examples
5) What is the difference between SQL and NoSQL?
6) Explain SDLC and Agile Model
7) Describe your Final Year Project
8) Write an SQL query to count employees in each department
9) SQL query to find duplicate records in a table
10) Difference between INNER JOIN and LEFT JOIN
11) SQL query to fetch top 3 highest salaries
12) Write a program to check if a number is a palindrome
13) Write a program to check if two strings are anagrams
14) Find the missing number in a sorted array
15) Implement Stack using Array
16) What is Time and Space Complexity?
17) What are your strengths and weaknesses?
18) Why do you want to join CGI?
19) Are you open to working night shifts or relocating?
20) How do you manage deadlines or pressure?
1) Introduce Yourself
2) What is Full Stack Development?
3) Difference between GET and POST method
4) Explain OOPs Concepts with Examples
5) What is the difference between SQL and NoSQL?
6) Explain SDLC and Agile Model
7) Describe your Final Year Project
8) Write an SQL query to count employees in each department
9) SQL query to find duplicate records in a table
10) Difference between INNER JOIN and LEFT JOIN
11) SQL query to fetch top 3 highest salaries
12) Write a program to check if a number is a palindrome
13) Write a program to check if two strings are anagrams
14) Find the missing number in a sorted array
15) Implement Stack using Array
16) What is Time and Space Complexity?
17) What are your strengths and weaknesses?
18) Why do you want to join CGI?
19) Are you open to working night shifts or relocating?
20) How do you manage deadlines or pressure?
❤1
Accenture Interview Questions
1. Tell me about yourself.
2. Tell me about your project and your roles in your project
3. Are u designing the framework, so tell me the folder structure of your framework.
4. Tell me what type of framework you are using.
5. Tell me the maven command.
6. Write a program to reverse a string.
7. Open the url Amazon, search for mobiles, scroll the page two times, and find the xpath of 7th listed Mobile phone....write a generic xpath. It should work when I use the same xpath in the new tab also.
8. Suppose there are multiple mobile numbers given in a website ..how u find which are the valid mobile number and which are invalid..give count also.
9. Write a selenium program to read the data from the Excel sheet.
10. Difference between method overloading and method overriding
11. What is the primary key, and how is it different from the foreign key.
12. Http status codes
13. Http methods call
14. What is inheritance where u apply in your framework.
15. Explain the testng.xml file.
1. Tell me about yourself.
2. Tell me about your project and your roles in your project
3. Are u designing the framework, so tell me the folder structure of your framework.
4. Tell me what type of framework you are using.
5. Tell me the maven command.
6. Write a program to reverse a string.
7. Open the url Amazon, search for mobiles, scroll the page two times, and find the xpath of 7th listed Mobile phone....write a generic xpath. It should work when I use the same xpath in the new tab also.
8. Suppose there are multiple mobile numbers given in a website ..how u find which are the valid mobile number and which are invalid..give count also.
9. Write a selenium program to read the data from the Excel sheet.
10. Difference between method overloading and method overriding
11. What is the primary key, and how is it different from the foreign key.
12. Http status codes
13. Http methods call
14. What is inheritance where u apply in your framework.
15. Explain the testng.xml file.