Friday, May 4, 2012

A strange issue while using MultipleOutputs (Hadoop)

I used MutipleOutputs to store related records separately in each single file. 
The files created every time, but only one file of them came with content, and the file size is 4096 always ...

I printed a lot of logs, tried to find out where the problem is, but it seems everything went to correct places in reduce process.
...
....
......
After double check with the book "Hadoop: The Definitive Guide", I finally find my bug: the output stream was not closed correctly. T

Overwrite method public void cleanup(Context); in the Reduce class, and invoke .close() method to the MultipleOutputs instance. 

^.^ Fixed~


No comments:

Post a Comment