Skip to content

Commit bdb2bc4

Browse files
authored
Merge pull request #59 from GPUEngineering/feature/58-serialise-binary
Serialisation: text and binary
2 parents b633bed + 67ccce9 commit bdb2bc4

File tree

5 files changed

+134
-26
lines changed

5 files changed

+134
-26
lines changed

CHANGELOG.md

+17-2
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,29 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
<!-- ---------------------
9+
v1.6.0
10+
--------------------- -->
11+
## v1.6.0 - 2-12-2024
12+
13+
### Added
14+
15+
- Method `saveToFile` becomes `DTensor<T>::saveToFile(std::string pathToFile, Serialisation ser)`, i.e., the user can
16+
choose whether to save the file as a text (ASCII) file, or a binary one
17+
18+
### Changed
19+
20+
- Method `parseFromTextFile` renamed to `parseFromFile` (supports text and binary formats)
21+
22+
823
<!-- ---------------------
924
v1.5.2
1025
--------------------- -->
1126
## v1.5.2 - 1-12-2024
1227

1328
### Fixed
1429

15-
- Quick bug bix in `DTensor::parseFromTextFile` (passing storage mode to `vectorFromFile`)
30+
- Quick bug bix in `DTensor::parseFromTextFile` (passing storage mode to `vectorFromTextFile`)
1631

1732

1833
<!-- ---------------------
@@ -23,7 +38,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2338
### Fixed
2439

2540
- Set precision in `DTensor::saveToFile` properly
26-
- `DTensor<T>::parseFromTextFile` throws `std::invalid_argument` if `T` is unsupported
41+
- `DTensor<T>::parseFromFile` throws `std::invalid_argument` if `T` is unsupported
2742

2843
<!-- ---------------------
2944
v1.5.0

README.md

+12-4
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,8 @@ The `DTensor` `B` will be overwritten with the solution.
248248
249249
### 1.8. Saving and loading tensors
250250

251-
Tensor data can be stored in simple text files which have the following structure
251+
Tensor data can be stored in simple text files or binary files.
252+
The text-based format has the following structure
252253

253254
```text
254255
number_of_rows
@@ -259,13 +260,20 @@ data (one entry per line)
259260

260261
To save a tensor in a file, simply call `DTensor::saveToFile(filename)`.
261262

262-
To load a tensor from a file, the static function `DTensor<T>::parseFromTextFile(filename)` can be used. For example:
263+
If the file extension is `.bt` (binary tensor), the data will be stored in binary format.
264+
The structure of the binary encoding is similar to that of the text encoding:
265+
the first three `uint64_t`-sized positions correspond to the number of rows, columns
266+
and matrices, followed by the elements of the tensor.
267+
268+
To load a tensor from a file, the static function `DTensor<T>::parseFromFile(filename)` can be used. For example:
263269

264270
```c++
265-
auto z = DTensor<double>::parseFromTextFile("path/to/my.dtensor")
271+
auto z = DTensor<double>::parseFromFile("path/to/my.dtensor")
266272
```
267273

268-
If necessary, you can provide a second argument to `parseFromTextFile` to specify the order in which the data are stored (the `StorageMode`).
274+
If necessary, you can provide a second argument to `parseFromFile` to specify the order in which the data are stored (the `StorageMode`).
275+
276+
Soon we will release a Python API for reading and serialising (numpy) arrays to `.bt` files.
269277

270278
## 2. Cholesky factorisation and system solution
271279

include/tensor.cuh

+68-13
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,7 @@ enum StorageMode {
169169
defaultMajor = columnMajor
170170
};
171171

172+
172173
/**
173174
* This library uses tensors to store and manipulate data on a GPU device.
174175
* A tensor has three axes: [rows (m) x columns (n) x matrices (k)].
@@ -256,13 +257,16 @@ public:
256257
*
257258
* This static function reads data from a text file, creates a DTensor and uploads the data to the device.
258259
*
260+
* The data may be stored in a text file or a binary file. Binary files must have the extension .bt.
261+
*
259262
* @param path_to_file path to file as string
260263
* @param mode storage mode (default: StorageMode::defaultMajor)
261264
* @return instance of DTensor
262265
*
263266
* @throws std::invalid_argument if the file is not found
264267
*/
265-
static DTensor<T> parseFromTextFile(std::string path_to_file, StorageMode mode = StorageMode::defaultMajor);
268+
static DTensor<T> parseFromFile(std::string path_to_file,
269+
StorageMode mode = StorageMode::defaultMajor);
266270

267271
/**
268272
* Constructs a DTensor object.
@@ -504,7 +508,12 @@ public:
504508
/**
505509
* Saves the current instance of DTensor to a (text) file
506510
*
507-
* @param pathToFile
511+
* If the file extension is .bt, the data will be stored in a binary file.
512+
* Writing to and reading from a binary file is significantly faster and
513+
* the generated binary files tend to have a smaller size (about 40% of the
514+
* size of text files for data of type double and float).
515+
*
516+
* @param pathToFile path to file
508517
*/
509518
void saveToFile(std::string pathToFile);
510519

@@ -595,7 +604,7 @@ struct data_t {
595604
};
596605

597606
template<typename T>
598-
data_t<T> vectorFromFile(std::string path_to_file) {
607+
data_t<T> vectorFromTextFile(std::string path_to_file) {
599608
data_t<T> dataStruct;
600609
std::ifstream file;
601610
file.open(path_to_file, std::ios::in);
@@ -641,24 +650,70 @@ data_t<T> vectorFromFile(std::string path_to_file) {
641650
}
642651

643652
template<typename T>
644-
DTensor<T> DTensor<T>::parseFromTextFile(std::string path_to_file,
645-
StorageMode mode) {
646-
auto parsedData = vectorFromFile<T>(path_to_file);
653+
data_t<T> vectorFromBinaryFile(std::string path_to_file) {
654+
data_t<T> dataStruct;
655+
/* Read from binary file */
656+
std::ifstream inFile;
657+
inFile.open(path_to_file, std::ios::binary);
658+
inFile.read(reinterpret_cast<char *>(&(dataStruct.numRows)), sizeof(uint64_t));
659+
inFile.read(reinterpret_cast<char *>(&(dataStruct.numCols)), sizeof(uint64_t));
660+
inFile.read(reinterpret_cast<char *>(&(dataStruct.numMats)), sizeof(uint64_t));
661+
uint64_t numElements = dataStruct.numRows * dataStruct.numCols * dataStruct.numMats;
662+
std::vector<T> vecDataFromFile(numElements);
663+
for (size_t i = 0; i < numElements; i++) {
664+
T el;
665+
inFile.read(reinterpret_cast<char *>(&el), sizeof(T));
666+
vecDataFromFile[i] = el;
667+
}
668+
inFile.close();
669+
dataStruct.data = vecDataFromFile;
670+
return dataStruct;
671+
}
672+
673+
template<typename T>
674+
DTensor<T> DTensor<T>::parseFromFile(std::string path_to_file,
675+
StorageMode mode) {
676+
// Figure out file extension
677+
size_t pathToFileLength = path_to_file.length() ;
678+
std::string fileNameExtension = path_to_file.substr(pathToFileLength-3);
679+
typedef data_t<T> (*PARSER)(std::string);
680+
PARSER parser = (fileNameExtension == ".bt") ? vectorFromBinaryFile<T> : vectorFromTextFile<T>;
681+
auto parsedData = parser(path_to_file);
647682
DTensor<T> tensorFromData(parsedData.data, parsedData.numRows, parsedData.numCols, parsedData.numMats, mode);
648683
return tensorFromData;
649684
}
650685

651686
template<typename T>
652687
void DTensor<T>::saveToFile(std::string pathToFile) {
653-
std::ofstream file(pathToFile);
654-
file << numRows() << std::endl << numCols() << std::endl << numMats() << std::endl;
655-
std::vector<T> myData(numEl()); download(myData);
656-
if constexpr (std::is_floating_point<T>::value) {
657-
file << std::setprecision(std::numeric_limits<T>::max_digits10);
658-
}
659-
for(const T& el : myData) file << el << std::endl;
688+
std::vector<T> myData(numEl());
689+
download(myData);
690+
691+
// Figure out file extension
692+
size_t pathToFileLength = pathToFile.length() ;
693+
std::string fileNameExtension = pathToFile.substr(pathToFileLength-3);
694+
// If the extension is .bt...
695+
if (fileNameExtension == ".bt") {
696+
uint64_t nr = (uint64_t) numRows(),
697+
nc = (uint64_t) numCols(),
698+
nm = (uint64_t) numMats();
699+
std::ofstream outFile;
700+
outFile.open(pathToFile, std::ios::binary);
701+
outFile.write(reinterpret_cast<const char *>(&nr), sizeof(uint64_t));
702+
outFile.write(reinterpret_cast<const char *>(&nc), sizeof(uint64_t));
703+
outFile.write(reinterpret_cast<const char *>(&nm), sizeof(uint64_t));
704+
for (const T &el: myData) outFile.write(reinterpret_cast<const char *>(&el), sizeof(T));
705+
outFile.close();
706+
} else {
707+
std::ofstream file(pathToFile);
708+
file << numRows() << std::endl << numCols() << std::endl << numMats() << std::endl;
709+
if constexpr (std::is_floating_point<T>::value) {
710+
file << std::setprecision(std::numeric_limits<T>::max_digits10);
711+
}
712+
for (const T &el: myData) file << el << std::endl;
713+
}
660714
}
661715

716+
662717
template<typename T>
663718
void DTensor<T>::reshape(size_t newNumRows, size_t newNumCols, size_t newNumMats) {
664719
if (m_numRows == newNumRows && m_numCols == newNumCols && m_numMats == newNumMats) return;

main.cu

+10-4
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,15 @@
66

77

88
int main() {
9-
auto z = DTensor<size_t>::parseFromTextFile("../test/data/my.dtensor",
10-
StorageMode::rowMajor);
11-
std::cout << z;
12-
z.saveToFile("hohoho.dtensor");
9+
/* Write to binary file */
10+
auto r = DTensor<double>::createRandomTensor(3, 6, 4, -1, 1);
11+
std::string fName = "tensor.bt"; // binary tensor file extension: .bt
12+
r.saveToFile(fName);
13+
14+
/* Parse binary file */
15+
auto recov = DTensor<double>::parseFromFile(fName);
16+
auto err = r - recov;
17+
std::cout << "max error : " << err.maxAbs();
18+
1319
return 0;
1420
}

test/testTensor.cu

+27-3
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ TEST_F(TensorTest, randomTensorCreation) {
117117
}
118118

119119
/* ---------------------------------------
120-
* Save to file and parse
120+
* Save to file and parse (text)
121121
* --------------------------------------- */
122122

123123
TEMPLATE_WITH_TYPE_T
@@ -128,7 +128,7 @@ void parseTensorFromFile() {
128128
auto r = DTensor<T>::createRandomTensor(nR, nC, nM, -1, 1);
129129
std::string fName = "myTest.dtensor";
130130
r.saveToFile(fName);
131-
auto a = DTensor<T>::parseFromTextFile(fName);
131+
auto a = DTensor<T>::parseFromFile(fName);
132132
EXPECT_EQ(nR, a.numRows());
133133
EXPECT_EQ(nC, a.numCols());
134134
EXPECT_EQ(nM, a.numMats());
@@ -148,7 +148,31 @@ TEST_F(TensorTest, parseTensorUnsupportedDataType) {
148148
auto r = DTensor<double>::createRandomTensor(nR, nC, nM, -1, 1);
149149
std::string fName = "myTest.dtensor";
150150
r.saveToFile(fName);
151-
EXPECT_THROW(DTensor<char>::parseFromTextFile(fName), std::invalid_argument);
151+
EXPECT_THROW(DTensor<char>::parseFromFile(fName), std::invalid_argument);
152+
}
153+
154+
/* ---------------------------------------
155+
* Save to file and parse (binary)
156+
* --------------------------------------- */
157+
158+
TEMPLATE_WITH_TYPE_T
159+
void parseTensorFromFileBinary() {
160+
size_t nR = 20, nC = 40, nM = 6;
161+
auto r = DTensor<T>::createRandomTensor(nR, nC, nM, -1, 1);
162+
std::string fName = "myTest.bt";
163+
r.saveToFile(fName);
164+
auto a = DTensor<T>::parseFromFile(fName);
165+
EXPECT_EQ(nR, a.numRows());
166+
EXPECT_EQ(nC, a.numCols());
167+
EXPECT_EQ(nM, a.numMats());
168+
auto diff = a - r;
169+
T err = diff.maxAbs();
170+
EXPECT_LT(err, 2 * std::numeric_limits<T>::epsilon());
171+
}
172+
173+
TEST_F(TensorTest, parseTensorFromFileBinary) {
174+
parseTensorFromFileBinary<float>();
175+
parseTensorFromFileBinary<double>();
152176
}
153177

154178
/* ---------------------------------------

0 commit comments

Comments
 (0)