Skip to content

Commit 4e4a9c6

Browse files
authored
fixed 1-4
1 parent 728641b commit 4e4a9c6

File tree

1 file changed

+29
-29
lines changed

1 file changed

+29
-29
lines changed

Diff for: README.md

+29-29
Original file line numberDiff line numberDiff line change
@@ -67,9 +67,9 @@
6767
[6] RITA: Group Attention is All You Need for Timeseries Analytics. https://arxiv.org/abs/2306.01926.
6868
[7] FlashAttention: Fast and Memory‐Efficient Exact Attention with IO‐Awareness. https://arxiv.org/abs/2205.14135.
6969
[8] Language Identification. https://en.wikipedia.org/wiki/Language_identification.
70-
[9] FastText Model for Language Identification. https://huggingface.co/facebook/fasttextlanguageidentification.
70+
[9] FastText Model for Language Identification. https://huggingface.co/facebook/fasttext-language-identification.
7171
[10] Transformer‐XL. https://arxiv.org/abs/1901.02860.
72-
[11] Byte‐Pair Encoding Tokenization. https://huggingface.co/learn/nlpcourse/en/chapter6/5.
72+
[11] Byte‐Pair Encoding Tokenization. https://huggingface.co/learn/nlp-course/en/chapter6/5.
7373
[12] SentencePiece Tokenization. https://arxiv.org/abs/1808.06226.
7474
[13] Tiktoken Library. https://github.com/openai/tiktoken.
7575
[14] Google’s Gemini. https://gemini.google.com/.
@@ -80,44 +80,44 @@
8080
[19] OpenAI’s Models. https://platform.openai.com/docs/models.
8181
[20] Meta’s LLaMA. https://llama.meta.com/.
8282
[21] Introduction to Transformers by Andrej Karpathy. https://www.youtube.com/watch?v=XfpMkf4rD6E.
83-
[22] Transformer Visualized. https://jalammar.github.io/illustratedtransformer/.
83+
[22] Transformer Visualized. https://jalammar.github.io/illustrated-transformer/.
8484
[23] Common Crawl. https://commoncrawl.org/.
85-
[24] Cross‐Entropy. https://en.wikipedia.org/wiki/Crossentropy.
86-
[25] Prompt Engineering. https://platform.openai.com/docs/guides/promptengineering.
85+
[24] Cross‐Entropy. https://en.wikipedia.org/wiki/Cross-entropy.
86+
[25] Prompt Engineering. https://platform.openai.com/docs/guides/prompt-engineering.
8787
[26] Beam Search. https://en.wikipedia.org/wiki/Beam_search.
8888
[27] Perplexity. https://en.wikipedia.org/wiki/Perplexity.
8989
[28] Gmail Smart Compose: Real‐Time Assisted Writing. https://arxiv.org/abs/1906.00080.
90-
[29] WordPiece Tokenization. https://huggingface.co/learn/nlpcourse/en/chapter6/6.
90+
[29] WordPiece Tokenization. https://huggingface.co/learn/nlp-course/en/chapter6/6.
9191
[30] Better & Faster Large Language Models via Multi‐token Prediction. https://arxiv.org/abs/2404.19737.
9292

9393
---
9494

9595
## Chapter 3: Google Translate
9696

97-
[1] Google Translate Service. https://blog.google/products/translate/googletranslatenewlanguages2024/.
97+
[1] Google Translate Service. https://blog.google/products/translate/google-translate-new-languages-2024/.
9898
[2] Neural Machine Translation by Jointly Learning to Align and Translate. https://arxiv.org/abs/1409.0473.
9999
[3] BERT: Pre‐training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805.
100100
[4] GPT Models. https://platform.openai.com/docs/models.
101101
[5] Claude Models. https://www.anthropic.com/claude.
102102
[6] Bidirectional Long Short‐Term Memory (BLSTM) Neural Networks for Reconstruction of Top‐Quark Pair Decay Kinematics. https://arxiv.org/abs/1909.01144.
103-
[7] BPE Tokenization. https://huggingface.co/learn/nlpcourse/en/chapter6/5.
103+
[7] BPE Tokenization. https://huggingface.co/learn/nlp-course/en/chapter6/5.
104104
[8] C4 Dataset. https://www.tensorflow.org/datasets/catalog/c4.
105105
[9] Wikipedia Dataset. https://www.tensorflow.org/datasets/catalog/wikipedia.
106-
[10] Stack Exchange Dataset. https://huggingface.co/datasets/HuggingFaceH4/stack‐e xchange‐preferences.
107-
[11] How Transformers Work. https://huggingface.co/learn/nlpcourse/en/chapter1/4.
106+
[10] Stack Exchange Dataset. https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences.
107+
[11] How Transformers Work. https://huggingface.co/learn/nlp-course/en/chapter1/4.
108108
[12] Exploring the Limits of Transfer Learning with a Unified Text‐to‐Text Transformer. https://arxiv.org/pdf/1910.10683.pdf.
109109
[13] BART: Denoising Sequence‐to‐Sequence Pre‐training for Natural Language Generation, Translation, and Comprehension. https://arxiv.org/abs/1910.13461.
110110
[14] mT5: A Massively Multilingual Pre‐trained Text‐to‐Text Transformer. https://arxiv.org/abs/2010.11934.
111111
[15] Multilingual Denoising Pre‐training for Neural Machine Translation. https://arxiv.org/abs/2001.08210.
112112
[16] BLEU Metric. https://en.wikipedia.org/wiki/BLEU.
113113
[17] ROUGE Metric. https://en.wikipedia.org/wiki/ROUGE_(metric).
114-
[18] METEOR Metric. https://www.cs.cmu.edu/~alavie/METEOR/pdf/BanerjeeLavie2005METEOR.pdf.
114+
[18] METEOR Metric. https://www.cs.cmu.edu/~alavie/METEOR/pdf/Banerjee-Lavie-2005-METEOR.pdf.
115115
[19] WordNet. https://wordnet.princeton.edu/.
116-
[20] No Language Left Behind: Scaling Human‐Centered Machine Translation. https://research.facebook.com/publications/nolanguageleftbehind/.
116+
[20] No Language Left Behind: Scaling Human‐Centered Machine Translation. https://research.facebook.com/publications/no-language-left-behind/.
117117
[21] Decoder‐Only or Encoder‐Decoder? Interpreting Language Model as a Regularized Encoder‐Decoder. https://arxiv.org/abs/2304.04052.
118118
[22] Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution. https://arxiv.org/abs/2103.06799.
119119
[23] Efficient Inference for Neural Machine Translation. https://arxiv.org/abs/2010.02416.
120-
[24] Meta’s Multilingual Model. https://ai.meta.com/blog/nllb200highqualitymachinetranslation/.
120+
[24] Meta’s Multilingual Model. https://ai.meta.com/blog/nllb-200-high-quality-machine-translation/.
121121
[25] Machine Translation Evaluation. https://en.wikipedia.org/wiki/Evaluation_of_machine_translation.
122122
[26] Word Error Rate (WER) Metric. https://en.wikipedia.org/wiki/Word_error_rate.
123123
[27] Automatic Language Identification Using Deep Neural Networks. https://research.google.com/pubs/archive/42538.pdf.
@@ -130,33 +130,33 @@
130130
[3] OpenAI’s Models. https://platform.openai.com/docs/models.
131131
[4] Google’s Gemini. https://gemini.google.com/.
132132
[5] Meta’s Llama. https://llama.meta.com/.
133-
[6] Beautiful Soup. https://beautifulsoup4.readthedocs.io/en/latest/.
133+
[6] Beautiful Soup. https://beautiful-soup-4.readthedocs.io/en/latest/.
134134
[7] Lxml. https://lxml.de/.
135135
[8] Document Object Model. https://en.wikipedia.org/wiki/Document_Object_Model.
136-
[9] Boilerplate Removal Tool. https://github.com/misobelica/jusText.
136+
[9] Boilerplate Removal Tool. https://github.com/miso-belica/jusText.
137137
[10] fastText. https://fasttext.cc/.
138138
[11] langid. https://github.com/saffsd/langid.py.
139139
[12] RoFormer: Enhanced Transformer with Rotary Position Embedding. https://arxiv.org/abs/2104.09864.
140-
[13] Llama 3 Human Evaluation. https://github.com/metallama/llama3/blob/main/eval_details.md.
140+
[13] Llama 3 Human Evaluation. https://github.com/meta-llama/llama3/blob/main/eval_details.md.
141141
[14] Exploring the Limits of Transfer Learning with a Unified Text‐to‐Text Transformer. https://arxiv.org/abs/1910.10683.
142142
[15] DeBERTa: Decoding‐enhanced BERT with Disentangled Attention. https://arxiv.org/abs/2006.03654.
143143
[16] Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. https://arxiv.org/abs/2006.16236.
144144
[17] Common Crawl. https://commoncrawl.org/.
145145
[18] C4 Dataset. https://www.tensorflow.org/datasets/catalog/c4.
146-
[19] Stack Exchange Dataset. https://github.com/EleutherAI/stackexchangedataset.
146+
[19] Stack Exchange Dataset. https://github.com/EleutherAI/stackexchange-dataset.
147147
[20] Training Language Models to Follow Instructions with Human Feedback. https://arxiv.org/abs/2203.02155.
148148
[21] Alpaca. https://crfm.stanford.edu/2023/03/13/alpaca.html.
149-
[22] Dolly‐15K. https://www.databricks.com/blog/2023/04/12/dollyfirstopencommerciallyviableinstructiontunedllm.
150-
[23] Introducing FLAN: More Generalizable Language Models with Instruction Fine‐Tuning. https://research.google/blog/introducingflanmoregeneralizablelanguagemodelswithinstructionfinetuning/.
149+
[22] Dolly‐15K. https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm.
150+
[23] Introducing FLAN: More Generalizable Language Models with Instruction Fine‐Tuning. https://research.google/blog/introducing-flan-more-generalizable-language-models-with-instruction-fine-tuning/.
151151
[24] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. https://arxiv.org/abs/2204.05862.
152152
[25] Proximal Policy Optimization Algorithms. https://arxiv.org/abs/1707.06347.
153153
[26] Direct Preference Optimization: Your Language Model is Secretly a Reward Model. https://arxiv.org/abs/2305.18290.
154154
[27] Illustrating RLHF. https://huggingface.co/blog/rlhf.
155155
[28] RLHF Progress and Challenges. https://www.youtube.com/watch?v=hhiLw5Q_UFg.
156156
[29] State of GPT. https://www.youtube.com/watch?v=bZQun8Y4L2A.
157-
[30] Different Sampling Methods. https://huggingface.co/blog/how‐to‐generate.
157+
[30] Different Sampling Methods. https://huggingface.co/blog/how-to-generate.
158158
[31] The Curious Case of Neural Text Degeneration. https://arxiv.org/abs/1904.09751.
159-
[32] OpenAI’s API Reference. https://platform.openai.com/docs/apireference/chat/create.
159+
[32] OpenAI’s API Reference. https://platform.openai.com/docs/api-reference/chat/create.
160160
[33] Cheat Sheet: Mastering Temperature and Top_p in ChatGPT API. https://community.openai.com/t/cheat‐sheet‐mastering‐temperature‐and‐top‐p‐in‐chatgpt‐api/172683.
161161
[34] PIQA: Reasoning about Physical Commonsense in Natural Language. https://arxiv.org/abs/1911.11641.
162162
[35] SocialIQA: Commonsense Reasoning about Social Interactions. https://arxiv.org/abs/1904.09728.
@@ -169,10 +169,10 @@
169169
[42] SQuAD: 100,000+ Questions for Machine Comprehension of Text. https://arxiv.org/abs/1606.05250.
170170
[43] QuAC Dataset. https://quac.ai/.
171171
[44] BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions. https://arxiv.org/abs/1905.10044.
172-
[45] GSM8K Dataset. https://github.com/openai/gradeschoolmath.
172+
[45] GSM8K Dataset. https://github.com/openai/grade-school-math.
173173
[46] MATH Dataset. https://github.com/hendrycks/math/.
174-
[47] HumanEval Dataset. https://github.com/openai/humaneval.
175-
[48] MBPP Dataset. https://github.com/googleresearch/googleresearch/tree/master/mbpp.
174+
[47] HumanEval Dataset. https://github.com/openai/human-eval.
175+
[48] MBPP Dataset. https://github.com/google-research/google-research/tree/master/mbpp.
176176
[49] Measuring Massive Multitask Language Understanding. https://arxiv.org/abs/2009.03300.
177177
[50] Measuring Massive Multitask Language Understanding. https://arxiv.org/abs/2009.03300.
178178
[51] AGIEval: A Human‐Centric Benchmark for Evaluating Foundation Models. https://arxiv.org/abs/2304.06364.
@@ -187,24 +187,24 @@
187187
[60] Question Answering for Privacy Policies: Combining Computational and Legal Perspectives. https://arxiv.org/abs/1911.00841.
188188
[61] AdvGLUE Benchmark. https://adversarialglue.github.io/.
189189
[62] Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. https://arxiv.org/abs/1907.11932.
190-
[63] AdvBench. https://github.com/llmattacks/llmattacks.
191-
[64] Chatbot Arena Leaderboard. https://lmsys‐chatbotarenaleaderboard.hf.space/.
190+
[63] AdvBench. https://github.com/llm-attacks/llm-attacks.
191+
[64] Chatbot Arena Leaderboard. https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard.
192192
[65] A Survey on Recent Advances in LLM‐Based Multi‐Turn Dialogue Systems. https://arxiv.org/abs/2402.18013.
193193
[66] Better & Faster Large Language Models via Multi‐Token Prediction. https://arxiv.org/abs/2404.19737.
194194
[67] Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context. https://arxiv.org/abs/2403.05530.
195195
[68] HyperAttention: Long‐Context Attention in Near‐Linear Time. https://arxiv.org/abs/2310.05869.
196196
[69] MM‐LLMs: Recent Advances in Multimodal Large Language Models. https://arxiv.org/abs/2401.13601.
197197
[70] Multimodality and Large Multimodal Models. https://huyenchip.com/2023/10/10/multimodal.html.
198198
[71] What is Retrieval‐Augmented Generation? https://cloud.google.com/use‐cases/retrieval‐augmented‐generation.
199-
[72] How to Customize an LLM: A Deep Dive to Tailoring an LLM for Your Business. https://techcommunity.microsoft.com/t5/ai‐machine‐learning‐blog/how‐to‐customize‐an‐llm‐a‐deepdive‐to‐tailoring‐an‐llmforyour/ba‐p/4110204.
199+
[72] How to Customize an LLM: A Deep Dive to Tailoring an LLM for Your Business. https://techcommunity.microsoft.com/blog/machinelearningblog/how-to-customize-an-llm-a-deep-dive-to-tailoring-an-llm-for-your-business/4110204.
200200
[73] Llama 2: Open Foundation and Fine‐Tuned Chat Models. https://arxiv.org/abs/2307.09288.
201201
[74] Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned. https://arxiv.org/abs/2209.07858.
202-
[75] Introducing Superalignment. https://openai.com/index/introducingsuperalignment/.
202+
[75] Introducing Superalignment. https://openai.com/index/introducing-superalignment/.
203203
[76] Language Models are Few‐Shot Learners. https://arxiv.org/abs/2005.14165.
204204
[77] GQA: Training Generalized Multi‐Query Transformer Models from Multi‐Head Checkpoints. https://arxiv.org/abs/2305.13245.
205205
[78] Chain‐of‐Thought Prompting Elicits Reasoning in Large Language Models. https://arxiv.org/abs/2201.11903.
206206
[79] Efficiently Scaling Transformer Inference. https://arxiv.org/abs/2211.05102.
207-
[80] Prover‐Verifier Games Improve Legibility of Language Model Outputs. https://openai.com/index/proververifiergamesimprovelegibility/.
207+
[80] Prover‐Verifier Games Improve Legibility of Language Model Outputs. https://openai.com/index/prover-verifier-games-improve-legibility/.
208208

209209
---
210210

0 commit comments

Comments
 (0)