京东下单后隔天就到了,由于笔者学历不高,书中高级的公式都看不懂,粗略的翻了一下,然后去找了书中对应的资料来源,发现主要都是来自于DeepSeek的论文。
门控网络
Adaptive Mixtures of Local Experts
https://www.cs.toronto.edu/~fritz/absps/jjnh91.pdf
GRPO算法
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
https://arxiv.org/pdf/2402.03300
有一个博主整理了DeepSeek的技术分析 网站链接包含Deep Seek论文
混合专家
DeepSeek Technical Analysis — (1) Mixture-of-Experts
https://dataturbo.medium.com/key-techniques-behind-deepseek-models-10x-efficiency-1-moe-9bd2534987c8
多注意力机制
DeepSeek Technical Analysis — (2)Multi-Head Latent Attention
https://dataturbo.medium.com/deepseek-technical-analysis-2-mla-74bdb87d4ad2
多Token预测
DeepSeek Technical Analysis — (3) Multi-Token Prediction
https://dataturbo.medium.com/deepseek-technical-analysis-3-multi-token-prediction-f8f3ea7eaf9c
DualPipe算法
DeepSeek Technical Analysis — (4)DualPipe
https://dataturbo.medium.com/deepseek-technical-analysis-4-dualpipe-672d0ef63ee6
FP8精度训练
DeepSeek Technical Analysis — (5) FP8 Training
https://dataturbo.medium.com/deepseek-technical-analysis-5-fp8-training-ff34768727b8