Skip to content

Files

Latest commit

d79dffb · Aug 26, 2018

History

History

Questions

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Nov 29, 2016
Nov 28, 2016
Aug 26, 2018
Nov 26, 2016

General

Tell me about yourself? / What is your last project and how did you use Hadoop?

What are your responsibilities?

What are the problems you faced and how you debugged them?

Hadoop Questions

Tell us about your cluster?

What is shuffling?

What are optimization techniques for hadoop?

When more shuffling is better?

How to handle small files in hadoop?

What is federation mode in hadoop?

What are different nodes in hadoop cluster?

What is difference between backup node and secondary namenode?

What is DistCp?

What is Speculative execution?

For a file of size MB how many blocks will be created on HDFS?

Apache Spark Question :

What is mechanism by which you can resubmit failed spark job automatically?

What is Left Semi Join, Left Anti Join in spark?

How Spark ensures fault tolerance ?

What is the difference between GroupByKey and ReduceByKey?

What is wide and narrow transformations?/ when does stage boundary occurs?

What is broadcast variable and accumulator?

How to broadcast small dataframe?

What is RDD?

How to create RDD?

Have you done any Optimization in spark ?

How to read XML file using MapReduce?

What is difference between HiveQL and SparkSQL?

What is difference between SQL and HiveQL?

What is difference between RDD and DATAFRAME and dataset? What is difference between Rdd and Dataset?

What is DAG and what is the use of it?

Given , we have transformation,. actions, transformations again and one more action, how many DAGS will be there? / WHat is output of transformation and actions?

What are the options that can be specified along with spark-submit script for memory allocation?/ How do we specify class while submitting application /How will you specify parallelism in spark?

What are various deploy modes in spark-submit ?

Difference between yarn-client and yarn-cluster mode?

What is your understanding of specifying “total-executor cores “ while submitting spark job

How to dynamically control number of executors?

What is difference between partition and partitioner?

How to find spark version

If on cluster, you are getting memory error, how will you resolve it?

What is transformation and what is action? Difference between them?

What transformations you have used?

What actions you have used?

Difference between apache spark and apache storm?

What is Lazy Evaluation in Apache Spark?

What is the difference between cache and persist?

Have you encountered memory errors, how you resolved it? Like Spark java.lang.OutOfMemoryError: Java heap space

How to find delta between two files?

How to find difference between two dataframes

Explain what oozie is?

Have you faced any heap /memory issues in spark?

How does spark streaming works? Explain how you would find max value within a streaming rdd.

What is the basic component of spark streaming?

Apache Hive Question :

What is difference between static and dynamic partitioning ?

What is difference between natural key join and cross join?

What is difference between bucketing and partitioning in hive?/ Give me example of bucketing and partitioning.

After creating table in hive, I want to access data in Impala but when I run the query in impala, it doesn't show updated information, how should I fix it?

I have dataset with key and value pair. I want to shuffle data in random way that no value gets associated with its original key, how can I achieve it using hive?

How to find where HIVE stores the file?

How to print column names along with data in hive?

Diff ordering techniques in Hive

How to add th column to existing external hive table filled with random sequence?

What are the problems associated with joining table which is bucketed to the table which is partitioned?

What is MapSide join?

What is Streaming Join?

How to broadcast join a small table with larger table?

What is vectorization and when its beneficial and when its not?

What is SMB join?

Have you used in window analytical functions?

What are the type of tables in hive?/ What kind of table have you used in hive

How to specify various file formats while creating table in hive? Do you know about Serde?

What is difference between managed table and external table?

Optimizations in hive?

What are three execution engines available in hive?

Why Hive and not Hbase?/ What is the difference between HIve and Hbase?

Tell me how will you design a database for streaming platform?

File Formats Question

What are different file formats you have used on hadoop?

What is Parquet and what advantages it has over other file formats?

What is difference between Parquet and ORC (optimized row columnar) and which one is better?

What compression are you using in apache hive?

Between Avro and Parquet , which one is better?.

Apache OOZIE Questions :

Tell me some oozie actions?

Between crontab and oozie, which one is better? /Why oozie?

How to connect oozie to outside control flow tool like control-M based on actions?

How to setup oozie to skip running jobs on holidays ?

Explain how oozie works?

How flume ensures fault tolerance ?

How to find running daemons in standalone mode?

Which sqoop version you are using and when did mainframe support got added to sqoop?

Have you implemented kerberos on hadoop cluster? Or have you used K commands?

How to dynamically control number of mappers while running sqoop job

Python Questions :

Is python compile language?

Standard python libraries used?

What is size of integer in python?

What are serialization formats(in python)?

How python manages memory?

Is tuple mutable or immutable?

Tell me some datastructures in Python?

Why python instead of scala on spark when scala has better performance?

How to access file on linux server using python?

What is List Comprehension ? Show with example

What version of Azure DataFactory you are using>?

How to execute a java code?

How to find PID of process and How to find how much resources a process is taking on linux?

What is Amazon’s version of Hadoop?

What are tools for ingestion?

Behavioural questions :

Please give an example of a time when you were working on a project and encountered a problem and how you solved it.

Tell me about time you and colleague had conflict of opinions and how did you resolve it?

Tell me about most difficult task you faced on the job?

how you'd manage/what steps you'd take if you felt you weren't going to meet a deadline

How you helped improve a system/process and made it more efficient ?

give an example of where you’ve worked as part of a team

How do you deal with a difficult employee

apart from getting on well at your job, what else do you want to achieve in your first few months

How do you see yourself in years

What are your strengths,

What is an example of a challenge(s) you faced doing research for a client?

Give three words that describe you?

Tell me a little bit about yourself and why you would be a good fit for <>.

What do you know about <>?

Why are you looking to move jobs?

Why do you want to work for <>?

Give an example of an occasion when you used logic to solve a problem.

Give an example of a goal you reached and tell me how you achieved it.

Describe a decision you made that was unpopular and how you handled implementing it.

Have you gone above and beyond the call of duty? If so, how?

What do you do when your schedule is interrupted? Give an example of how you handle it./Describe a situation where you slipped behind your deadline. What was the impact on the client?

How did you control the outcome?

Have you had to convince a team to work on a project they weren't thrilled about? How did you do it?

Have you handled a difficult situation with a co-worker? How?

Tell me about how you worked effectively under pressure.

Give me a specific example of a time when you had to conform to a policy with which you did not agree.