Transformer
Core Papers and Code
Key Concepts
- QKV computation in Self-Attention
- The role of Scaled Dot-Product
- Principles of Multi-Head Attention
- Tokenization and Tokenizer
- Word Embedding
- Positional Encoding
- Attention Mechanism
- Feed Forward Network
- Masking
- Layer Normalization
- Decoding Techniques
Deep Dive
- Transformer paper paragraph-by-paragraph reading [Paper Reading]
Attention Mechanism Learning Resources
- [HD bilingual subtitles] Andrew Ng explains Transformer working principles in detail (2025)
- Mastering the Attention Mechanism thoroughly
贡献者
Mira190贡献 2 次 · 最近 2025/09/13
github-actions[bot]贡献 1 次 · 最近 2026/05/11
longsizhuo贡献 1 次 · 最近 2026/05/06
这篇文章有帮助吗?
最近更新
Involution Hell© 2026 byCommunityunderCC BY-NC-SA 4.0