-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix for NUTCH-2455 more efficient usage of hostdb in generate #254
base: master
Are you sure you want to change the base?
Conversation
Please review only fix for NUTCH-2455 more efficient usage of hostdb in generate(c1ce018) The "added id to output files" is not correct commit, I have reverted it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @okedoki can you please format this entire patch according to the eclipse-codeformatter thank you
I found a bug with partitioned that prevents to get correct hostdb data to the correct reducer. It is fixed. For some reasons, I have a conflict with Generator from master. I assume it happened because of autoformating, so instead of correct comparison it shows that the whole code of Generator is replaced. What is the rule for fixing in this case? |
Mmmm OK @okedoki we need to resolve this conflict. The issue here is that you have indented everything by 4 spaces by the looks of it. This is incorrect as indenting accoridng to the code formatting template is 2 space indents. Please update the ppull request again if you could. Thanks |
@lewismc |
@okedoki thank you very much, this is a big patch and we need to test it out. |
… by reference, fixed with clone
There was a silly bug that didnt copy hostdb correctly in reducer because of copy-by-reference. hostDomainCounts.put(key.second.toString(), new MutablePair<HostDatum, int[]>((HostDatum) hostDatum.clone(), new int []{1,0})); at line 484 |
Three questions/modification left open: