-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding HBase raw tap #37
Conversation
@@ -0,0 +1,81 @@ | |||
<?xml version="1.0" encoding="UTF-8"?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to nitpick, but would you mind removing the POM? I think we should convert to maven soon, but probably in a different pull req.
Hey, sorry for the delay -- the right way to go might be to get the functionality in that other HBase library, given that we've been neglecting maple a bit. @azymnis is going to take a look shortly. Awesome work! |
OK. I think I will create a pull request against SpyGlass, it seems to make sense consolidating the HBase support under one project. |
I think SpyGlass is the right place for the pull req. Very timely. I'll go ahead and close this request in lieu of that. |
This change adds HBaseRawTap and HBaseRawScheme.
The point of this tap is to avoid the need of defining the input columns in the mapper, and allowing more control in handling hbase rows in a cascading (and scalding) job.
The source tap outputs pairs of (rowkey, row), where rowkey is the actual row object. So it's possible to collect and manipulate a changing set of columns in the mapper without predefining them.
So, for instance, the first mapper in the pipe can transform the row like this (using scalding syntax) :
hbaseSource.map(('rowkey, 'row) -> ('key, 'field1, 'field2, 'field3))
where the output fields can be a combination of different columns in each row.
The source tap also adds support for providing a scan object (base64 encoded) for fully customizing the HBase read.
The sink tap expects a rowkey in the tuple, and will write other values as columns.