《DeepSeek核心技术揭秘》读后感

四月的奥德赛 学习资料 144 次浏览 没有评论

 

京东下单后隔天就到了,由于笔者学历不高,书中高级的公式都看不懂,粗略的翻了一下,然后去找了书中对应的资料来源,发现主要都是来自于DeepSeek的论文。

 

门控网络

Adaptive Mixtures of Local Experts

https://www.cs.toronto.edu/~fritz/absps/jjnh91.pdf

 

GRPO算法

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

https://arxiv.org/pdf/2402.03300

 

有一个博主整理了DeepSeek的技术分析 网站链接包含Deep Seek论文

 

混合专家

DeepSeek Technical Analysis — (1) Mixture-of-Experts

https://dataturbo.medium.com/key-techniques-behind-deepseek-models-10x-efficiency-1-moe-9bd2534987c8

 

多注意力机制

DeepSeek Technical Analysis — (2)Multi-Head Latent Attention

https://dataturbo.medium.com/deepseek-technical-analysis-2-mla-74bdb87d4ad2

 

 

多Token预测

DeepSeek Technical Analysis — (3) Multi-Token Prediction

https://dataturbo.medium.com/deepseek-technical-analysis-3-multi-token-prediction-f8f3ea7eaf9c

 

 

DualPipe算法

DeepSeek Technical Analysis — (4)DualPipe

https://dataturbo.medium.com/deepseek-technical-analysis-4-dualpipe-672d0ef63ee6

 

FP8精度训练

DeepSeek Technical Analysis — (5) FP8 Training

https://dataturbo.medium.com/deepseek-technical-analysis-5-fp8-training-ff34768727b8

 

 

发表回复

Go