This is a TF2 implementation of the paper: Self-supervised Deep Metric Learning for Pointsets, ICDE 2021 and the extended version: Towards Pointsets Representation Learning via Self-Supervised Learning and Set Augmentation, TPAMI. We proposed a novel deep representation learning technique for pointset using zero training labels called "Weighted Self-supervised EMD Triplet loss." You can find the link of this paper here. [ICDE version] [TPAMI version]
We follow Matthew J Kusner's paper "From Word Embeddings to Document Distances" by using set of word vectors dataset provided in this repository
and placing .mat
files into data/
folder. They also provided codes for converting raw text data into sets of word vectors too.
Also, we need to calculate pairwise EMD for each sample in the dataset using Python EMD module provided in the same repository by converting the output in .pk
into .csv
and placing it inside data/wmd/
.
In this repository, we provided BBCSports dataset for both pointset data and calculated EMD as the example.
- Python 3.8
- pip
You can create a new environment and install required packages using:
pip install -r requirements.txt
python train.py --dsname bbcsport --batchsize 32 --lr 1e-4 --epoch 1000 --loss gaussian --treshold 5
- To use Weighted Self-supervised EMD Triplet loss for training, please use
--loss gaussian
. - To use approximate EMD loss, please use
--loss
.
python train.py --dsname bbcsport --batchsize 32 --lr 1e-4 --epoch 1000 --loss triplet --treshold 5 --istransfer --loadmodel pretrained_bbcsport.h5
You can set the flag to train in fine-tuning step by --istransfer
and specify a directory of the pretrained model using --loadmodel directory\model.h5
.
We fine-tune a model using --loss ce
(crossentropy loss) and --loss triplet
(semi-hard negative mining triplet loss), which require labels.
pattaramanee dot a underscore s19 at vistec dot ac dot th
@INPROCEEDINGS{WSSET
author={P. {Arsomngern} and C. {Long} and S. {Suwajanakorn} and S. {Nutanong}},
booktitle={2021 IEEE 37th International Conference on Data Engineering (ICDE)},
title={Self-Supervised Deep Metric Learning for Pointsets},
year={2021}}
@INPROCEEDINGS{
@ARTICLE{9665285,
author={Arsomngern, Pattaramanee and Long, Cheng and Suwajanakorn, Supasorn and Nutanong, Sarana},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Towards Pointsets Representation Learning via Self-Supervised Learning and Set Augmentation},
year={2021},
volume={},
number={},
pages={1-1},
doi={10.1109/TPAMI.2021.3139113}}
We use Keras implementation of self-attention module from CyberZHG repository and modifying semi-hard negative mining triplet loss from Tensorflow Addons repository.