-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYSTEMDS-3541] Exploratory workload-aware compression on intermediates #2230
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2230 +/- ##
============================================
+ Coverage 72.46% 72.58% +0.11%
- Complexity 45453 45572 +119
============================================
Files 1469 1469
Lines 170893 171141 +248
Branches 33325 33377 +52
============================================
+ Hits 123846 124221 +375
+ Misses 37630 37516 -114
+ Partials 9417 9404 -13 ☔ View full report in Codecov by Sentry. |
cb7e6a9
to
183fad5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good!
However, we can get a bit more performance out of it, via small optimizations.
src/main/java/org/apache/sysds/hops/rewrite/RewriteCompressedReblock.java
Show resolved
Hide resolved
src/main/java/org/apache/sysds/runtime/compress/lib/CLALibBinaryCellOp.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/sysds/runtime/compress/lib/CLALibBinaryCellOp.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/sysds/runtime/compress/lib/CLALibBinaryCellOp.java
Show resolved
Hide resolved
src/main/java/org/apache/sysds/runtime/compress/lib/CLALibBinaryCellOp.java
Outdated
Show resolved
Hide resolved
src/test/java/org/apache/sysds/test/component/compress/lib/CLALibBinaryCellOpTest.java
Show resolved
Hide resolved
Added a config option for aggressive compression and extended the compression workload analyzer to detect aggregation operations and binary matrix-vector operations when inputs are compressed as a single column group. Updated cost estimation for compression on already compressed inputs and removed scalars from compressible intermediate candidates. Added support for double compressed binary matrix-matrix operations and implemented both single-threaded and multithreaded compressed binary matrix-vector operations with single column group encoding. Removed the relaxed compression threshold and added a logging statement for potential improvements in compressed binary matrix-vector operations. Enabled always sampling for binary matrix-vector operations in CLALibBinaryCellOp, expanded test coverage, and introduced a new compression algorithm test case for k-means with intermediate compression enabled. I also extended the CLALibBinaryCellOp binary matrix-vector (sparse & dense) op task to support left and right operations.
183fad5
to
db8820d
Compare
This PR explores the aggressive compression on intermediates, explicitly for the kmeans builtin algorithm. This commit adds new compressed operations to avoid the decompression and minimize the compression time of intermediates.
The runtime of the kmeans algorithm on the census dataset was reduced from initially 50s with intermediate compression down to 17.5s with all the optimizations. Which is an overall improvement of 33% in comparison to the baseline runtime of workload-aware, non-aggressive compression of 27s.
A summary of the changes: