generated from jobindjohn/obsidian-publish-mkdocs
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* PUSH NOTE : Thomas Kipf.md * PUSH NOTE : Tim R. Davidson.md * PUSH NOTE : Simeng Sun.md * PUSH NOTE : Ilya Loshchilov.md * PUSH NOTE : Cheng-Ping Hsieh.md * PUSH NOTE : Boris Ginsburg.md * PUSH NOTE : Hyperspherical Variational Auto-Encoders.md * PUSH ATTACHMENT : Pasted image 20241010115957.png * PUSH ATTACHMENT : Pasted image 20241010115603.png * PUSH NOTE : nGPT - Normalized Transformer with Representation Learning on the Hypersphere.md * PUSH ATTACHMENT : Pasted image 20241010085554.png * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 13.md
- Loading branch information
Showing
12 changed files
with
91 additions
and
1 deletion.
There are no files selected for viewing
24 changes: 24 additions & 0 deletions
24
.../100 Reference notes/101 Literature/Hyperspherical Variational Auto-Encoders.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
--- | ||
authors: | ||
- "[[Tim R. Davidson|Tim R. Davidson]]" | ||
- "[[Luca Falorsi|Luca Falorsi]]" | ||
- "[[Nicola de Cao|Nicola de Cao]]" | ||
- "[[Thomas Kipf|Thomas Kipf]]" | ||
- "[[Jakub M. Tomczak|Jakub M. Tomczak]]" | ||
year: 2018 | ||
tags: | ||
- paper | ||
- geometric_dl | ||
url: https://arxiv.org/abs/1804.00891 | ||
share: true | ||
--- | ||
> [!tldr] Abstract | ||
> The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or -VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, -VAE, in low dimensions on other data types. Code at [this http URL](http://github.com/nicola-decao/s-vae-tf) and [this https URL](https://github.com/nicola-decao/s-vae-pytorch) | ||
|
||
# Notes | ||
|
||
- "However, even for m>20 we observe a vanishing surface problem (see Figure [6](https://ar5iv.labs.arxiv.org/html/1804.00891#A5.F6 "Figure 6 ‣ Appendix E COLLAPSE OF THE SURFACE AREA ‣ Hyperspherical Variational Auto-Encoders") in Appendix [E](https://ar5iv.labs.arxiv.org/html/1804.00891#A5 "Appendix E COLLAPSE OF THE SURFACE AREA ‣ Hyperspherical Variational Auto-Encoders")). This could thus lead to unstable behavior of hyperspherical models in high dimensions." | ||
- Basically, the hypesphere's surface area starts collapsing on high dimensions (m>20), which makes it unsuitable choice, as embeddings in this manifold lose discriminative power. This is backed by the paper's results, where s-vae outperforms n-vae up to d=40. | ||
- ![[Pasted image 20241010115957.png|Pasted image 20241010115957.png]] | ||
![[Pasted image 20241010115603.png|Pasted image 20241010115603.png]] |
23 changes: 23 additions & 0 deletions
23
...GPT - Normalized Transformer with Representation Learning on the Hypersphere.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
--- | ||
authors: | ||
- "[[Ilya Loshchilov|Ilya Loshchilov]]" | ||
- "[[Cheng-Ping Hsieh|Cheng-Ping Hsieh]]" | ||
- "[[Simeng Sun|Simeng Sun]]" | ||
- "[[Boris Ginsburg|Boris Ginsburg]]" | ||
year: 2024 | ||
tags: | ||
- paper | ||
- efficient_dl | ||
- transformers | ||
url: https://arxiv.org/abs/2410.01131 | ||
share: true | ||
--- | ||
> [!tldr] Abstract | ||
> We propose a novel neural network architecture, the normalized Transformer (nGPT) with representation learning on the hypersphere. In nGPT, all vectors forming the embeddings, MLP, attention matrices and hidden states are unit norm normalized. The input stream of tokens travels on the surface of a hypersphere, with each layer contributing a displacement towards the target output predictions. These displacements are defined by the MLP and attention blocks, whose vector components also reside on the same hypersphere. Experiments show that nGPT learns much faster, reducing the number of training steps required to achieve the same accuracy by a factor of 4 to 20, depending on the sequence length. | ||
![[Pasted image 20241010085554.png|Pasted image 20241010085554.png]] | ||
|
||
# Notes | ||
- Interesting, since [[Hyperspherical Variational Auto-Encoders|Hyperspherical Variational Auto-Encoders]] claims that high-dimensional hyperspheres are not well suited for embeddings due to the vanishing surface problem. However, the nGPT paper claims that hypersphere embeddings are beneficial for training transformers. There's some discussion at [Twitter](https://x.com/maksym_andr/status/1843923528502129122). | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
affiliation: | ||
- "[[NVIDIA|NVIDIA]]" | ||
share: true | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
affiliation: | ||
- "[[NVIDIA|NVIDIA]]" | ||
share: true | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
affiliation: | ||
- "[[NVIDIA|NVIDIA]]" | ||
share: true | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
affiliation: | ||
- "[[NVIDIA|NVIDIA]]" | ||
share: true | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
--- | ||
affiliation: | ||
- "[[University of Amsterdam|University of Amsterdam]]" | ||
- "[[Google DeepMind|Google DeepMind]]" | ||
share: true | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
--- | ||
affiliation: | ||
- "[[University of Amsterdam|University of Amsterdam]]" | ||
- "[[EPFL|EPFL]]" | ||
share: true | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.