Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding HBase raw tap #37

Closed
wants to merge 12 commits into from
Closed

Adding HBase raw tap #37

wants to merge 12 commits into from

Conversation

rore
Copy link

@rore rore commented Jun 11, 2013

This change adds HBaseRawTap and HBaseRawScheme.

The point of this tap is to avoid the need of defining the input columns in the mapper, and allowing more control in handling hbase rows in a cascading (and scalding) job.

The source tap outputs pairs of (rowkey, row), where rowkey is the actual row object. So it's possible to collect and manipulate a changing set of columns in the mapper without predefining them.
So, for instance, the first mapper in the pipe can transform the row like this (using scalding syntax) :

hbaseSource.map(('rowkey, 'row) -> ('key, 'field1, 'field2, 'field3))

where the output fields can be a combination of different columns in each row.

The source tap also adds support for providing a scan object (base64 encoded) for fully customizing the HBase read.

The sink tap expects a rowkey in the tuple, and will write other values as columns.

@@ -0,0 +1,81 @@
<?xml version="1.0" encoding="UTF-8"?>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to nitpick, but would you mind removing the POM? I think we should convert to maven soon, but probably in a different pull req.

@sritchie
Copy link
Contributor

Hey, sorry for the delay -- the right way to go might be to get the functionality in that other HBase library, given that we've been neglecting maple a bit. @azymnis is going to take a look shortly. Awesome work!

@rore
Copy link
Author

rore commented Jun 13, 2013

OK. I think I will create a pull request against SpyGlass, it seems to make sense consolidating the HBase support under one project.
If you do want to merge it into maple I'll remove the POM.

@sritchie
Copy link
Contributor

I think SpyGlass is the right place for the pull req. Very timely. I'll go ahead and close this request in lieu of that.

@sritchie sritchie closed this Jun 13, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants