Group Discounts available for 3+ students and Corporate Clients

MapReduce Interview Questions and Answers

MapReduce Interview Questions

What happens in case of hardware/software failure?  

MapReduce framework must be able to recover from both hardware (disk failures, RAM errors) and software (bugs, unexpected exceptions) errors. Both are common and expected.

Is it possible to start reducers while some mappers still run? Why?

No. Reducer’s input is grouped by the key. The last mapper could theoretically produce key already consumed by running reducer.

Define a straggler?

Straggler is either map or reduce task that takes unusually long time to complete.

What is speculative execution (also called backup tasks)? What problem does it solve?

Identical copy of the same task is executed on multiple nodes. Output of the fastest task used.
Speculative execution helps if the task is slow because of hardware problem. It does not help if the distribution of values over keys is skewed.

What does combiner do? 

Combiner does local aggregation of key/values pairs produced by mapper before or during shuttle and sort phase. In general, it reduces amount of data to be transferred between nodes.

The framework decides how many times to run it. Combiner may run zero, one or multiple times on the same input.

Explain mapper life cycle?

Initialization method is called before any other method is called. It has no parameters and no output.

Map method is called separately for each key/value pair. It process input key/value pairs and emits intermediate key/value pairs.

Close method runs after all input key/value have been processed. The method should close all open resources. It may also emit key/value pairs.

Explain reducer life cycle?

Initialization method is called before any other method is called. It has no parameters and no output.

Reduce method is called separately for each key/[values list] pair. It process intermediate key/value pairs and emits final key/value pairs. Its input is a key and iterator over all intermediate values associated with the same key.

Close method runs after all input key/value have been processed. The method should close all open resources. It may also emit key/value pairs.  

What is local aggregation and why is it used?

Either combiner or a mapper combines key/value pairs with the same key together. They may do also some additional preprocessing of combined values. Only key/value pairs produced by the same mapper are combined.

Key/Value pairs created by map tasks are transferred between nodes during shuffle and sort phase. Local aggregation reduces amount of data to be transferred.

If the distribution of values over keys is skewed, data pre-processing in combiner helps to eliminate reduce stragglers.

What is in-mapper combining? State advantages and disadvantages over writing custom combiner? 

Local aggregation (combining of key/value pairs) done inside the mapper.

Map method does not emit key/value pairs, it only updates internal data structure. Close method combines and preprocess all stored data and emits final key/value pairs. Internal data structure is initialized in init method.

Advantages:

– It will run exactly once. Combiner may run multiple times or not at all.

– We are sure it will run during map phase. Combiner may run either after map phase or before reduce phase. The latter case provides no reduction in transferred data.

– In-mapper combining is typically more effective. Combiner does not reduce amount of data produced by mappers, it only groups generated data together. That causes unnecessary object creation, destruction, serialization and deserialization.

Disadvantages:

– Scalability bottleneck: the technique depends on having enough memory to store all partial results. We have to flush partial results regularly to avoid it. Combiner use produce no scalability bottleneck.

Describe order inversion design pattern?

Order inversion is used if the algorithm requires two passes through mapper generated key/value pairs with the same key. The first pass generates some overall statistic which is then applied to data during the second pass. The reducer would need to buffer data in the memory just to be able to pass twice through them.

First pass result is calculated by mappers and stored in some internal data structure. The mapper emits the result in closing method, after all usual intermediate key/value pairs.

The pattern requires custom partitioning and sort. First pass result must come to the reducer before usual key/value pairs. Of course, it must come to the same reducer.

Describe reduce side join between tables with one-on-one relationship?

Mapper produces key/value pairs with join ids as keys and row values as value. Corresponding rows from both tables are grouped together by the framework during shuffle and sort phase.

Reduce method in reducer obtains join id and two values, each represents row from one table. Reducer joins the data.

Describe map side join between two database tables?

Map side join works only if following assumptions hold:
– both datasets are sorted by the join key,
– both datasets are partitioned the same way.

Mapper maps over larger dataset and reads corresponding part of smaller dataset inside the mapper. As the smaller set is partitioned the same way as bigger one, only one map task access the same data. As the data are sorted by the join key.

Describe memory backed join?

Smaller set of data is loaded into the memory in every mapper. Mappers loop over larger dataset and joins it with data in the memory. If the smaller set is too big to fit into the memory, dataset is loaded into memcached or some other caching solution.
 Our design of course tutorials and interview questions is practical and informative. At TekSlate, we offer resources to help you learn various IT courses. We avail both written material and demo video tutorials. For in-depth knowledge and practical experience explore Online MapReduce Training.

“At TekSlate, we are trying to create high quality tutorials and articles, if you think any information is incorrect or want to add anything to the article, please feel free to get in touch with us at info@tekslate.com, we will update the article in 24 hours.”

0 Responses on MapReduce Interview Questions and Answers"

    Leave a Message

    Your email address will not be published. Required fields are marked *

    Support


    Please Enter Your Details and Query.
    Three + 6