Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Spark Dataset support #814

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

Tomczik76
Copy link

No description provided.

@CLAassistant
Copy link

CLAassistant commented Apr 29, 2020

CLA assistant check
All committers have signed the CLA.

@johnynek
Copy link
Collaborator

This is great! thanks for the PR. I'd like to merge it. you need to accept the contributor agreement above.

Also, can you run scalafmtAll at the sbt prompt and commit the results?


package object implicits {

import com.twitter.algebird.BF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move the imports to the top of the file?

/**
* spark exposes an Aggregator type, so this is here to avoid shadowing
*/
type AlgebirdAggregator[A, B, C] = Aggregator[A, B, C]
val AlgebirdAggregator = Aggregator

implicit class ToAlgebird[T](val rdd: RDD[T]) extends AnyVal {
implicit class ToAlgebirdRDD[T](val rdd: RDD[T]) extends AnyVal {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not change the name of this? this will break source and binary compatibility for people.

}

after {
// try spark.stop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that we can remove this after block?

@@ -0,0 +1,29 @@
package com.twitter.algebird.spark

package object implicits {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would actually create an EncoderInstances trait and have package object spark extends EncoderInstances .

Users will do import com.twitter.algebird.spark._ and they will get these lower priority implicits. I can see that this was to possibly be inline with spark way of things, but I think this is more consistent with the rest of algebird.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this and it didn't work with spark.implicits._. So I'm moving them into the package object spark instead.


import scala.reflect.ClassTag
implicit def kryoPriorityQueueEncoder[A](implicit ct: ClassTag[PriorityQueue[A]]): Encoder[PriorityQueue[A]] =
org.apache.spark.sql.Encoders.kryo[PriorityQueue[A]](ct)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
org.apache.spark.sql.Encoders.kryo[PriorityQueue[A]](ct)
org.apache.spark.sql.Encoders.kryo[PriorityQueue[A]]

iirc ClassTag is a context bound so the above should be possible?

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.Encoder
import org.apache.spark.sql.Dataset
import com.twitter.algebird.BloomFilter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems unused?

* above to at least check compilation
*/
test("aggregate") {
val sparkImplicits = spark.implicits
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would move these to #L24 to be more evident that these are indeed the SparkSession implicitis

@johnynek
Copy link
Collaborator

johnynek commented May 7, 2020

@Tomczik76 I'd love to include this. Any idea when you might have time to get back around to it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants