NumeroIssue,IdIssue,TituloIssue,DescricaoIssue,CriacaoIssue,RepositorioIssue,LinkIssue
754,2905089893,Create django.yml, ,2025-03-09T00:09:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/754
752,2900922588,[BUG],"**Describe the bug**
DeepSeek AI is stuck on the ### ""One more step before you proceed..."" loading screen and does not proceed further, preventing access to the platform.

**To Reproduce**
Steps to reproduce the behavior:
Open a web browser (Chrome, Firefox, Edge, etc.).
Navigate to DeepSeek AI.
The page displays ""One more step before you proceed..."" and does not load further.

**Expected behavior**
DeepSeek AI should load successfully and allow access to its features.

**Screenshots**


**Additional context**
Device: Windows
Network Type: All (Wi-Fi, mobile data, office network)
Antivirus: K7 Ultimate Security
Troubleshooting Attempts:
Tried different browsers
Cleared cache and cookies
Disabled browser extensions
Used incognito mode
Checked   settings",2025-03-06T16:47:20Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/752
748,2899176133,[BUG]当对话长度足够久时，AI开始显得不再活灵活现，甚至感觉有些像机器人,"**Describe the bug**
我试图使用deepseekAPI以制作一个持久记忆的AI体，就像一个真实的不会记忆消失的人一样，但我发现，当对话长度达到一定程度（最早出现在tokens used = 35K左右）后，AI的回答开始缺乏创新性，有时甚至完全重复之前已经说过的输出。随着token长度的增加，AI愈加显得“依赖已有的token记忆”。

**To Reproduce**
使tokens used ≥ 35K

**Expected behavior**
 ",2025-03-06T02:42:38Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/748
747,2896603638,Add zh version of README,"Sync with the current README.md. Hope this action can:

- trigger more localization versions, PR, and issues come.
- create a basic style for localization variants",2025-03-05T08:27:37Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/747
746,2896471158,sharing my study notes about DeepSeekV3  分享一下我的学习笔记,"
 
here is my notes on DeepSeekV3 (currently only the model part)

这里是我整理的关于DeepSeekV3 的学习笔记 (目前只有模型部分)


",2025-03-05T07:22:59Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/746
745,2896397540,官方API 如何开启联网搜索、上传图片解析、上传附件解析？有没有真人在线客服、技术支持群？,"curl -X POST ""  
-H ""Authorization: Bearer KEY""  
-H ""Content-Type:    
-d '{
  ""model"": ""deepseek-chat"",
  ""messages"": [
    {""role"": ""user"", ""content"": ""deepseek-chat模型 如何开启联网搜索、上传图片解析、上传附件解析？""}
  ],
  ""stream"": true
}'",2025-03-05T06:40:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/745
744,2895931860,可否给几个纯RL训练的数据示例？,Thank you very much!,2025-03-05T02:17:18Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/744
743,2895661685,Create an AI that can write a WORKING SIMPLE Python APP.,"Just hilarious to see all the HYPE around how great DEEPSEEK is. While it might be good at TELLING JOKES, it's terrible at writing PYTHON CODE.

Every time I have asked it to create a simple Python App if has failed at over 20 attempts each time.

Hilarious how much money has been wasted on AI. Just a JOKE!!!!!!!

",2025-03-04T23:08:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/743
741,2894073252,[BUG]模型Gate部分bias如果为较大的负值会选择被mask的专家组,bias如果为较大的负值导致scores几乎全为负数，被mask的专家组scores为0，则会选到被mask的专家组，选到原本score更小的专家。可考虑使用-inf加法进行mask，或对bias的值域增加限制使其>=0,2025-03-04T12:20:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/741
736,2891088516,Docs: add LightLLM as supported engine,"Thanks for the significant contributions to the community. LightLLM now supports   deployment in its latest release, with PD-disaggregation and other features coming soon. We look forward to future collaborations. Our optimizations will also be continuously introduced on  ",2025-03-03T12:17:36Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/736
734,2890203249,关于DualPipe的问题，看论文查阅资料后仍不清楚，请贵司回答下,"Q1.DeepSeekV3论文Fig.5中，device0 橙色格子6到橙色格子9中间的三个前向传播是什么？
Q2.DeepSeekV3在部署的时候device0是不是同时部署了模型第一层和最后一层？以支持双向训练？因为我看到Fig.5的device0输入第一批数据前向传播的时候，device7此时也输入一批数据进行前向传播

 ",2025-03-03T05:24:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/734
733,2890025244,[提问]请教一下大家目前有没有部署toC V3推理服务的案例参考？,"Discord的讨论环境太糟糕了。。。无奈才在issue这边求助，还请谅解！
当前硬件情况：
显卡：A800(80GB) X8 服务器若干（很悲催由于不是hoppers架构所以没法去尝试DeepEP）
尝试过：
1. Ollama部署，单机大概只能支持2~3个并发，ollama似乎也不支持多机分布式部署。
2. vllms，用IB+ray多机部署了，但是对并发效率似乎也没太大改善

目前希望的是能够先以20~30并发为目标，请教一下大家能否提供一些思路或者现行的方案呢？此前从未接触过这种大参数模型的部署，希望有过相关经历的大家能够指导一下，感激不尽！",2025-03-03T02:48:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/733
731,2888946352,Question: Where can I download a 1.5B model for ollama?,"Hello,

I tried searching online for a 1.5B model (as I do not have enough computer resources) for ollama without success?
My computer currently runs under Linux.
Would you please be able to provide a   to download the open-source model.

Thank you,

----",2025-03-01T14:40:29Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/731
730,2888835157,能不能将开源仓库同步到gitee上,"如题。。。
github老是抽风",2025-03-01T11:30:27Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/730
729,2888036512,add intro file,"This is convenience file, it is the first section of the Technical Report.",2025-02-28T20:48:46Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/729
728,2886414741,[BUG] tool call not work,"**Describe the bug**
當我使用TOOL CALL, RESPONSE回來的是空白

**To Reproduce**
直接使用  當中的EXAMPLE

**Expected behavior**
應該有回應

**Screenshots**


",2025-02-28T07:25:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/728
725,2884924019,RTL style (direction) for Persian Chats,"like this issue:
 
DeepSeek Supporters, fix this like chatGPT chatbot.
we need to RTL direction for   chats.
  
",2025-02-27T15:52:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/725
724,2883462469,Possible Typo in ZB1P Pipeline Bubble Calculation Formula in DeepSeek-V3 Report,"In the DeepSeek-V3 report PDF, I noticed that on page 13, the total bubble for the ZB1P pipeline parallel method is described as (PP-1)(F+B-2W), whereas in the original Zero Bubble paper, the total bubble for the ZB-H1 method should be (PP-1)(F+B-W). Could this be a typo?",2025-02-27T05:19:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/724
722,2882237048,QQ群需求 微信人太少了,QQ群需求 微信人太少了 请求组织qq开发者群,2025-02-26T16:58:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/722
721,2880823234,[BUG]json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0).,"from openai import OpenAI
client = OpenAI(api_key="""", base_url="" response = client.chat.completions.create(
    model=""deepseek-reasoner"",
    messages=[
        {""role"": ""system"", ""content"": ""You are a helpful assistant""},
        {""role"": ""user"", ""content"": ""1+1=?""},
    ],
    temperature=0
)
print(response)
print(response.choices[0].message)
print(response.choices[0].message.content)
When I use the official code to call the API, I get: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0).
Why is this? It works when I first use the code, but it only works once and then it stops working. Can anyone solve this?",2025-02-26T09:12:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/721
720,2880811412,modify the explanation of MLA, ,2025-02-26T09:08:31Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/720
718,2879661971,Rename DeepSeek_V3.pdf to DeepSeekv3pdf, ,2025-02-25T22:07:59Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/718
716,2878092407,AI+动态知识库 实现企业级用户 数据知识库隔离 的 全新AI的技术咨询。,"我的业务场景：
基于deepseek的大模型（应用？），让我的成千上万的企业用户到我的网站，可以使用这个大模型（应用？）进行提问使用。
然后，每个企业可以创建自己的知识库配置到这个大模型（应用？）里面，这样，我的企业用户即可以使用这个大模型（应用？），又有了他们自己家的特有的知识库，所以，每家企业的知识库是须要数据隔离的。
目前我试了：
1.开放API大模型的接口无法联合知识库一起使用，形成问答。
2.开放API应用那里有知识库，但是无法根据我的业务场景 动态选择该应用下的不同知识库。

所以，我既要AI推理问答能力，又要知识库能力赋予每家企业用户，且保证每家企业用户数据是隔离的，毕竟商业数据不能乱窜，对吧。

请问，如果想实现我的业务场景，有哪些技术方案呢？
谢谢指导。",2025-02-25T11:34:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/716
714,2876542746,Deepseek 360种应用交流群,"#

",2025-02-25T00:54:52Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/714
711,2873660513,"[BUG] (Due to technical issues, the search service is temporarily unavailable.)","**Describe the bug**
A clear and concise description of what the bug is.
(Due to technical issues, the search service is temporarily unavailable.)
**To Reproduce**
Steps to reproduce the behavior.
Enable Search function and search for something
**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.


**Additional context**
Add any other context about the problem here.
",2025-02-24T01:38:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/711
710,2873630090,AI-Powered Debugging Bot,"**Is your feature request related to a problem? Please describe.**
Developers spend a lot of time debugging, but understanding stack traces and logs is challenging.

**Describe the solution you'd like**
Deepseek could analyze errors, suggest fixes, and even explain what went wrong in simple terms.

",2025-02-24T01:08:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/710
707,2872883370,Delete Meaningless issues #2,"
1. #706 AD
2. #704 
3. #703 ~~Generated by DeepSeek~~
4. #698 
5. #697 AD
6. #694 AD
7. #695 
8. #664 ",2025-02-23T03:14:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/707
705,2870908224,"why is the ""self.dim"" have to be 7168","`class Gate(nn.Module):
super().__init__()
self.dim = args.dim
        self.topk = args.n_activated_experts 
self.n_groups = args.n_expert_groups
        self.topk_groups = args.n_limited_groups
        self.score_func = args.score_func
        self.route_scale = args.route_scale
        self.weight = nn.Parameter(torch.empty(args.n_routed_experts, args.dim))
        self.bias = nn.Parameter(torch.empty(args.n_routed_experts)) if self.dim == 7168 else None`
why??",2025-02-22T15:49:36Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/705
704,2870756970,Deep seek, ,2025-02-22T13:58:43Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/704
703,2870663380,功能建议 - 提升用户交互与个性化体验,"**建议描述**  
我希望为   提供以下功能改进建议，以提升用户体验和个性化服务：

1. **更精准的用户意图预测**  
   - 当前用户需要提供非常准确的提示词才能获得理想结果。建议优化模型，使其能够更准确地预测用户意图，减少对精确提示词的依赖。

2. **用户画像功能**  
   - 随着用户提问的增多，  可以逐步构建用户画像，预测用户的职业、家庭、身体状况等信息。  
   - 例如，如果用户提问过关于职业、家庭或心理健康的问题，  可以以用户账户为键存储这些信息，并用于提供更个性化的回答。  
   - 注意：此功能需要严格遵守隐私保护原则，确保用户数据安全。

3. **用户个性化记忆功能**  
   - 允许用户向   提供建议、意见或需要记住的个性化信息。  
   - 当其他用户问及相关内容时，  可以将这些信息作为答案提供。  
   - 此功能仅适用于未在网上检索到的冷门且个性化的内容。  
   - 需要评估问题和答案的重要性或严重性，避免滥用或泄露敏感信息。

**背景**  
当前 AI 模型的交互方式仍然依赖于用户的精确输入，且缺乏对用户个性化需求的深度理解。通过引入用户画像和个性化记忆功能，可以显著提升用户体验，使   更加智能和贴心。

**具体内容**  
1. **意图预测优化**  
   - 通过分析用户历史提问和上下文，优化模型对用户意图的理解。  
   - 减少对精确提示词的依赖，提供更自然的交互体验。

2. **用户画像构建**  
   - 根据用户提问内容，逐步构建用户画像（如职业、家庭、健康状况等）。  
   - 在保护用户隐私的前提下，利用画像提供个性化回答。

3. **个性化记忆功能**  
   - 允许用户提交需要记住的个性化信息。  
   - 当其他用户问及相关内容时，  可以提供这些信息作为答案。  
   - 引入评估机制，确保问题和答案的安全性。

**预期效果**  
- 提升用户意图预测的准确性，减少用户输入负担。  
- 通过用户画像和个性化记忆功能，提供更贴心和个性化的服务。  
- 增强用户与   的互动体验，使其成为更智能的助手。",2025-02-22T10:50:48Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/703
697,2868905071,满血版R1免费API来啦！送千万tookens！,"▌划重点：这才是真·开发者神器！
无需充值！无需拼手速！点击直达👉  
🔥 三大核心优势吊打同行：
✔️ 2025.2.28前每日领500万tokens（累计30亿！）
✔️ 全网唯一671B参数满血版大模型
✔️ 支持实时联网搜索+私有知识库对接

这不是广告哦，是广告我吃三斤屎！这是互利共赢，拿我的邀请码还能多15元代金卷，相当于七百五十万tokens了！
🚀全网最高品质DeepSeek R1 Tokens，延迟低至20ms内，500万TPM！
🔥DeepSeek-R1 体验拉满，不卡顿就是爽！
🎁 每邀请一位新用户，双方最高获赠3000万+ tokens，多邀多得，上不封顶！

赶紧薅羊毛：  ",2025-02-21T12:29:41Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/697
696,2868360247,[BUG] Gate/grouped_topk scoring func dtype issue(BF16 vs FP32),"**Describe the bug**

on huggingface implementation, GateMoE will use FP32 for Linear and further compute. see  

while on github implementation, Gate weight is BF16 and it will use BF16 for linear, scoring_func(sigmod) and further. see link  

would this cause accuracy issue? and any   impl? 


**To Reproduce**
Steps to reproduce the behavior.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here.
",2025-02-21T08:31:30Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/696
695,2868316957,why is it always busy,"ALWAYS WHEN I TYPE FIRST QUESTION IT GOES BUSY 

woe",2025-02-21T08:08:14Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/695
694,2868220356,DeepSeek R1 671B满血版限时免费用！每天白嫖500万tokens,"▌划重点：这才是真·开发者神器！
无需充值！无需拼手速！点击直达👉  
🔥 三大核心优势吊打同行：
✔️ 2025.2.28前每日领500万tokens（累计30亿！）
✔️ 全网唯一671B参数满血版大模型
✔️ 支持实时联网搜索+私有知识库对接

💡 真实用户锐评：
""某些平台充完值连客服都失踪？在DeepSeek R1这根本不存在！响应速度直接拉满，功能多到像开外挂，关键还稳得一批！""——某不愿透露姓名的全栈工程师

",2025-02-21T07:12:57Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/694
693,2868146364,请问更小参数量的deepseek-v3模型是否有计划开放获取呢？,代码中模型参数有16B、236B和671B版本的配置，但目前似乎只开放了671B的权重获取，请问其他两个版本是否有计划开源呢？,2025-02-21T06:40:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/693
689,2865352999,非推理模型的蒸馏,请问您那边是否有试过非推理模型的蒸馏，效果如何？,2025-02-20T07:50:04Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/689
688,2864784328,问题返回的content是个类似富文本的格式，请问前端解析用什么现有的工具可以把格式完全显示出来,"

 
",2025-02-20T01:26:06Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/688
687,2863706725,Turkish Translation: README & README_WEIGHTS,"### **Description:**  
This PR adds Turkish translations for the following documentation files:  

-   →    
-   →    

The translations maintain the original document structure, ensuring both technical accuracy and Markdown compatibility.  

To improve organization, I can move the translations to a   folder if preferred. Additionally, we can list available translations in the main   for better accessibility. Let me know if you’d like any modifications! 🚀  

GitHub Repository: DeepSeek-V3-Turkish

📩 **Contact:** [can.deliktas  

Best regards,  
**Can Deliktaş**  ",2025-02-19T15:34:29Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/687
686,2863480789,[BUG] torch version unavailable (Fedora 41),"**Describe the bug**
Torch version on requirements is not available on Fedora 41

 
**To Reproduce**
Install Fedora 41
Update to latest packages
Follow install 

**Expected behavior**
install requirements even on newer versions

**Screenshots**
NA

**Additional context**
NA
",2025-02-19T14:11:43Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/686
685,2862792912,[BUG],"**Describe the bug**
We deployed the DeepseekV3 model on an 8-card 3090 server, and we reported the following error from the command line startup

**To Reproduce**
Our cuda is 12.1, pytorch is 2.4.1, and python is 3.10


**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**

 
**Additional context**
Add any other context about the problem here.
",2025-02-19T09:47:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/685
680,2861338347,Update model.py,hggchfcchchchchgchc,2025-02-18T18:52:32Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/680
679,2859212848,[BUG],"**Describe the bug**
场景：使用官网V3对话，未选深度思考和联网搜索，输入对话：”今年一共多少周，本周是第几周？“，回复是：”2023年共有52周。今天是2023年10月12日，本周是第41周（从2023年10月9日到2023年10月15日）。“，但是实际我是2025年2月18日10点多进行的对话，明显回复的是错误的。

**To Reproduce**
Steps to reproduce the behavior.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**


**Additional context**
Add any other context about the problem here.
",2025-02-18T03:02:38Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/679
677,2857906954,请问deepseekV3使用的什么训练框架,"**Is your feature request related to a problem? Please describe.**
我在DEEPSEEK_LLM的论文中看到你们使用HAI_LLM框架作为你们的模型训练框架，并且用到了3D并行，我想请问你下V3和R1仍然使用的是该框架吗，并且3D并行是如何实现的，能否分享一下

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
",2025-02-17T13:42:21Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/677
676,2857645037,Code Audit on DeepSeek V3,"Hi all! 

I have created a code auditing tool using AI and I let DeepSeek-V3 run through it. Hopefully its helpful:  
It has definitly one of the better outcomes & one orange flag: 


CodeDD is still WIP (developed by just me), so happy and keen to recieve your feedback!

Camillo

PS: Not sure if this is the right spot to post this  ",2025-02-17T12:00:02Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/676
675,2857228342,deepseek amis  amis-api,推荐一款很好用的  ,2025-02-17T09:10:45Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/675
673,2856938307,推理示例中的3份config文件分别代表什么意思？,inference 目录下有3份配置文件，分别代表什么意思？,2025-02-17T06:53:31Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/673
672,2856856622,为什么FP8的累加器需要34bits，具体有什么理论依据呢？, ,2025-02-17T06:03:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/672
671,2856251959,Update LICENSE-CODE, ,2025-02-16T18:14:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/671
670,2856104414,[疑问] 关于DeepseekV3Model类默认关闭gradient checkpointing.的原因,"感谢DeepSeek团队优秀的工作！

我在阅读HuggingFace上DeepSeek-V3模型的代码时，发现在 中DeepseekV3PreTrainedModel类中声明了支持 ，但是在DeepseekV3Model类中似乎默认关闭了gradient checkpointing. （1372行）代码的切片如下：

 
我想请教下开发者为什么要这么设置呢？期待开发者和社区伙伴的答疑。",2025-02-16T13:31:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/670
669,2855881617,[BUG] Unexpected Chinese-Language Response in English Conversation (Cached Data Crosstalk),"During a conversation about regex whitelisting for #BlockIt on macOS, the AI returned a Chinese-language technical tutorial about   alignment that was unrelated to the query. This appears to be a cached data cross-talk error.

1. **User Query**:  
can you make a regex to whitelist a subreddit

2. **Follow-up**:  
Clarifications about #BlockIt macOS usage
3. **Observed Result**:  
Response included Chinese technical content about image alignment (no user-provided Chinese input)

**Expected Behavior**:  
Responses should remain focused on the user's query   without injecting unrelated cached data.

**Actual Behavior**:  
Irrelevant Chinese-language OpenCV tutorial content appeared mid-conversation.

**Technical Details**:  
-  Occurrence Time: 2025-02-16T04 22Z (system time)  
-  User Locale: en-AU    
-  Affected Service: Likely response generation pipeline  
-  Error Type: Probable cached data mix-up between user sessions

**Additional Context**:  
-  No Chinese input provided by user  
-  Error persisted across multiple follow-up queries  
-  User Agent: Raycast AI
-  Full conversation thread available upon request


",2025-02-16T04:54:33Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/669
665,2855067273,是否可以采取 map/reduce 的策略，在集群中节点本地进行计算，降低数据搬运的开销, ,2025-02-15T02:58:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/665
664,2854453688,DeepSeek有考虑过在逻辑语（Lojban）上训练大模型吗，这种人工语言非常的形式化，且规则明确, ,2025-02-14T18:25:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/664
661,2852937129,How convert bfloat16 to fp8 model?,"I use code below in kernel.py to convert bfloat16 to fp8. But I cannot use vllm load converted fp8 model.

def act_quant_kernel(x_ptr, y_ptr, s_ptr, BLOCK_SIZE: tl.constexpr):
    pid = tl.program_id(axis=0)
    offs = pid * BLOCK_SIZE + tl.arange(0, BLOCK_SIZE)
    x = tl.load(x_ptr + offs).to(tl.bfloat16)
    s = tl.max(tl.abs(x))   448.
    y = x   s
    y = y.to(y_ptr.dtype.element_ty)
    tl.store(y_ptr + offs, y)
    tl.store(s_ptr + pid, s)


def act_quant(x: torch.Tensor, block_size: int = 128) -> Tuple[torch.Tensor, torch.Tensor]:
    assert x.is_contiguous()
    assert x.size(-1) % block_size == 0
    y = torch.empty_like(x, dtype=torch.float8_e4m3fn)
    s = x.new_empty(*x.size()[:-1], x.size(-1)   block_size, dtype=torch.bfloat16)
    grid = lambda meta: (triton.cdiv(x.numel(), meta['BLOCK_SIZE']), )
    act_quant_kernelgrid
    return y, s
",2025-02-14T07:56:16Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/661
660,2852538459,源码中world size是什么含义,rt，想问一下 源码中的world size是什么含义，源码中 整除world size的意义是啥,2025-02-14T02:46:30Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/660
659,2852466775,[Question] about suggestions on hardware design in DeepSeek v3 paper,"I am a hardware designers and really appreciate that the DeepSeek v3 paper included suggestions on hardware design! 

I have a few questions:
1. Could you please share the methodology of the experiment where you measured Hopper Tensor Core FP8 GEMM accuracy? Paper states that only the highest 14bits are calculated in tensor core when doing FP8 GEMM. 
2. Do you have any suggestions on inference focused hardware? For example the FP8 accumulation precision needed and generally any other suggestions? 

Thank you!",2025-02-14T01:47:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/659
658,2852424699,关于deepseek中Multi Head Latent Attention 中的一些问题,"1、为什么Query在升维后 分出需要rope位置编码部分和不需要rope编码部分，而Key是在降维后得到rope这部分？
2、为什么Value向量是kv升维后分出来的，Key和Value为什么不是独立的？
",2025-02-14T01:10:16Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/658
657,2851611518,[Edited] Fix minor bug in the main function,"Changes made in branch: **MayureshMore:main**
[Edited] Fix minor bug in the main function
",2025-02-13T16:58:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/657
656,2851525175,del other, ,2025-02-13T16:21:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/656
654,2850396123,demo project,"
I'm a newbie and want to learn how to generate my own model based on DeepSeek's knowledge distillation technology.
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
 
# 设置设备
device = torch.device(""cuda"" if torch.cuda.is_available() else ""cpu"")
 
# 超参数
epochs = 20
batch_size = 256
temperature = 4  # 温度参数
alpha = 0.7      # 软标签损失权重
 
# 数据加载
transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
 
train_set =   train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True)
 
# 定义老师模型（ResNet-18）
teacher = torchvision.models.resnet18(pretrained=True)
teacher.fc = nn.Linear(teacher.fc.in_features, 10)  # CIFAR-10有10类
teacher = teacher.to(device)
 
# 定义学生模型（MobileNetV2）
student = torchvision.models.mobilenet_v2(pretrained=True)
student.classifier[1] = nn.Linear(student.last_channel, 10)
student = student.to(device)
 
# 训练老师模型（此处假设老师已预训练好，直接加载）
# 实际中需要先训练老师模型，此处为简化跳过
 
# 定义损失函数和优化器
criterion_hard = nn.CrossEntropyLoss()           # 硬标签损失
criterion_soft = nn.KLDivLoss(reduction='batchmean')  # 软标签损失
optimizer = optim.Adam(student.parameters(), lr=0.001)
 
# 蒸馏训练循环
for epoch in range(epochs):
    teacher.eval()   # 固定老师模型
    student.train()  # 训练学生模型
    
    running_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        
        # 前向传播
        with torch.no_grad():
            teacher_logits = teacher(inputs)
        
        student_logits = student(inputs)
        
        # 计算损失
        # 软标签损失（使用温度参数软化）
        soft_loss = criterion_soft(
            nn.functional.log_softmax(student_logits   temperature, dim=1),
            nn.functional.softmax(teacher_logits   temperature, dim=1)
        ) * (alpha * temperature * temperature)  # 缩放损失
        
        # 硬标签损失
        hard_loss = criterion_hard(student_logits, labels) * (1 - alpha)
        
        total_loss = soft_loss + hard_loss
        
        # 反向传播
        optimizer.zero_grad()
        total_loss.backward()
        optimizer.step()
        
        running_loss += total_loss.item()
    
    print(f'Epoch {epoch+1}, Loss:  
 
print(""Distillation finished!"")",2025-02-13T08:58:19Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/654
653,2850216671,[Question] On IBGDA implementation.,"> Additionally, we leverage the IBGDA (NVIDIA, 2022) technology to further minimize latency and enhance communication efficiency.

May I know that are you using NVSHMEM or libgdsync or any other?

Thanks.",2025-02-13T07:36:50Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/653
652,2849944750,[BUG]DEEPSEEK无法回答正常问题,"
",2025-02-13T04:48:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/652
650,2849360545,[BUG] DeepSeek - Outdated Vulnerable Software - High (8.8),"# Risk: High (8.8)
 

**NOTE: **  The score was derived from the highest vulnerability in the software dependency package. 

---
# Description:
While conducting a security review of  , we observed the noted web application used outdated 3P software with known vulnerabilities. The risks associated with using such dependencies are significant and include security vulnerabilities, data breaches, malware infections, compliance violations, and reputation damage.

---
# Impact:
These vulnerabilities can be exploited by attackers to compromise the system, steal sensitive data, infect it with malware, violate industry regulations, and harm the organization's reputation. Outdated software is often no longer supported by its developers, which means that any security vulnerabilities that are discovered are unlikely to be patched. This makes it easier for hackers to exploit these vulnerabilities and gain access to our system or steal our data.

---

# Affected Assets:
## Affected Software:
1. certifi 2. jinja2 3. tqdm 4. transformers 5. urllib3 
---

# Evidence:

## depscan:
 
## snyk:
 
## trivy:
 

---
# Replicate Finding:
6. Download & Install OWASP Dependency Check:  
7. Navigate to the impacted repo (locally) & run the following command:  
8. Open either the CSV file or HTML file for results

---

#  
We recommend updating the affected software to the latest supported version. Additionally, we recommend hardening the system after patching and ensuring all installed software including the noted software are patched. For more details, please the references below!

**Be advised, the above patch should be applied to all other system(s) running the impacted software that are managed by the team.**

---

# References:

1.  2.  3.  ",2025-02-12T21:10:36Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/650
649,2849355289,[BUG] Insecure Data Processing - Timing Attack Against Secret - High (7.4),"# Risk: High (7.5)
 

**Score Explanation:**  
The vulnerability has a high confidentiality impact (C:H) because a successful timing attack can leak sensitive information. The issue is exploitable remotely (AV:N), though it requires significant effort and precision (AC:H). No privileges are required (PR:N), and the attack does not require user interaction (UI:N). Integrity is unaffected (I:N), but availability can be minimally impacted due to computational overhead (A:L).

---
# Executive Summary
DeepSeek-V3, a Mixture-of-Experts (MoE) LLM, is vulnerable to a **timing attack against secret verification** due to its use of a **non-constant-time** comparison routine in token processing. This flaw could allow an attacker to infer secret values, such as authentication tokens or cryptographic keys, by measuring response times. While DeepSeek-V3 delivers state-of-the-art performance, this issue poses a serious risk to applications that rely on it for secure processing of sensitive inputs.

---
## Detail Finding:
While performing a security code review of  , we observed the noted code repo is susceptible to a Timing Attack Against Secret. Specifically, the vulnerability arises from how **prompt_mask** is computed and used in **DeepSeek-V3's inference pipeline**. Particularly, the comparison logic at lines **59 and 68 in  ** introduces **timing discrepancies** based on token values.
1. **Line 59** ( ) checks whether a token exists but does not use a **constant-time approach**, leading to processing time variations based on token content.
2. **Line 68** ( ) introduces further timing variations when checking the **end-of-sequence (EOS) token**.

The timing attack vulnerability we’ve identified in DeepSeek-V3 aligns with the following **OWASP Top 10 for Large Language Model (LLM) Applications 2025** categories:
1. **LLM02:2025 Sensitive Information Disclosure**: This category addresses scenarios where LLMs inadvertently expose confidential data, including personal identifiable information (PII), financial details, or proprietary business information. In the context of DeepSeek-V3, the timing attack could allow attackers to infer sensitive information by analyzing response times, leading to unauthorized data access and privacy violations. 
2. **LLM08:2025 Vector and Embedding Weaknesses**: This risk pertains to vulnerabilities in systems utilizing vectors and embeddings, especially in Retrieval Augmented Generation (RAG) setups. Weaknesses in how vectors and embeddings are generated, stored, or retrieved can be exploited to inject harmful content, manipulate model outputs, or access sensitive information. In DeepSeek-V3, the non-constant-time verification routine could be exploited to manipulate embeddings, leading to potential data leakage or unauthorized access. 

---
## Impact:
An attacker could craft special inputs, measure response delays, and infer **private data** by exploiting these timing inconsistencies. Additionally, an attacker could perform the following attacks:
- **Sensitive token leakage:** Attackers could extract secret keys, authentication tokens, or model-internal data through statistical analysis of response times.
- **Potential model poisoning:** If DeepSeek is used in a **multi-tenant environment**, adversaries could deduce how different input sequences affect the model’s state.
- **Increased risk in security-critical deployments:** AI-driven **access control mechanisms, chatbots handling confidential queries, or secure computations** are at risk.

While DeepSeek-V3 offers **efficient inference and high performance**, this timing flaw could undermine its security, especially in sensitive use cases.

---
# Affected Assets:
## Affected File(s):
-  
-  

---
# Evidence:
##  
 

##  
 

---
# Replicate Finding:
3. Clone the impacted repo:  
4. Navigate into the noted repo
5. Open the impacted file(s)
6. Got to impacted line

---
#  
We recommend implementing **constant-time operations** for secret-dependent comparisons. Instead of directly comparing tokens, leverage **PyTorch’s optimized cryptographic-safe operations**. Additionally, we recommend the following hardening steps:
1. **Use padding techniques** to mask timing variances.
2. **Normalize execution times** to reduce distinguishability.
3. **Conduct differential analysis** to detect timing discrepancies in responses.

## **Fix (Using Constant-Time Masking)**

Modify ** ** to ensure **constant-time computation**:

 
For more information & context, please see the reference section below.

**Please ensure that the above patch is applied to all affected software, services, applications, instances or systems managed by the team.**

---
# References:
1. Constant-Time Comparison in PyTorch: [https  
2. Preventing Timing Attacks in Deep Learning: [https  
3.  4.  ",2025-02-12T21:07:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/649
645,2847370884,如何挣钱，求高人指点,"自己包装api开发的咨询平台用户付费太少，大家都选择大厂包装的免费app。
个人创业者应该如何破局呢",2025-02-12T07:10:26Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/645
644,2846989669,请问Deepseek目前有技术社区交流平台吗?,"**请问目前有技术社区交流平台吗?**

我想咨询关于在 Rocky Linux 8.9（64 位）上使用 Deepseek 的问题，具体有以下几点：

分享 AI 工具的使用心得
记录日常使用习惯，交流最佳实践，帮助更多人高效利用 AI 工具。

评测最新 AI 产品
集中整理最新推出的 AI 工具，提供详细的使用建议和客观评价，帮助用户快速了解其优缺点。

探索模型部署与应用
深入研究如 DeepSeek 等模型的本地化部署、微调方法以及实际应用场景，分享技术经验与解决方案。

技术与思想的碰撞
打造开放社区，促进技术与创意的自由交流，激发更多创新灵感。

期待与大家共同探讨，推动 AI 技术的普及与应用！",2025-02-12T02:04:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/644
643,2846979759,Deepseek在Rocky Linux 8.9 (64位)上面部署Docker服务,"我想咨询关于在 Rocky Linux 8.9（64 位）系统上部署 Deepseek API 的问题，具体有以下几点：

1. 是否可以使用 Docker 单独部署 Deepseek 服务？类似于其他服务（如接入微信机器人的 link-ai）的部署方式。
2. 或者是否有类似 ChatGPTNextWeb 的独立部署方案？

期待您的解答，感谢！",2025-02-12T01:55:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/643
641,2845452444,Add Turkish Translation for README and README_WEIGHTS ," **Add Turkish Translation for README and README_WEIGHTS**  

**Description:**  
This PR adds a Turkish translation for the following documentation files:  
-    
-    

The translation ensures both **technical accuracy** and **Markdown compatibility** while maintaining the original document structure.  

Let me know if any modifications are needed! 🚀  
 

My E-mail:can.deliktas@protonmail.com",2025-02-11T13:49:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/641
638,2844578674,是否在开源模型上支持 工具调用？,是否在开源模型上支持 工具调用？如果支持那又该如何调用呢？,2025-02-11T08:10:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/638
636,2844205735,Create 陈诚,"给我写一篇论文
",2025-02-11T03:48:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/636
630,2841540080,请问下，部署BF16和FP8的deepseek v3，分别需要多少显存,目前看到的都是第三方博客上的，官方文档可以更新下啊,2025-02-10T07:12:22Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/630
629,2841469683,寻求从零开始训练大语言模型（LLM）的教程或指导,"大家好！我是一个LLM小白，最近想从零开始学习如何训练一个大模型，但发现网上的教程要么太零散，要么门槛太高。
希望有经验的大神能提供一个保姆级的教程，帮助我从0到1完成大模型的训练。

### 目前我遇到的挑战：
数据准备：如何准备训练所需的高质量数据？需要多少数据量才足够？
模型架构：如何选择合适的模型架构（比如 Transformer、GPT、BERT等）？
训练环境：如何搭建训练环境（GPU、TPU的配置、训练框架如 PyTorch 或 TensorFlow 等）？
优化与调参：如何调整超参数，保证模型收敛？
分布式训练：如何利用多台机器或多GPU进行训练，避免内存限制？
其他问题：如何评估训练效果，如何避免过拟合等问题？

### 希望获得的帮助：
是否有推荐的教程或文档，从头到尾讲解如何训练一个大模型？
如果有相关的 GitHub 项目或开源代码，可以参考的资料也请分享！
任何实用的经验和建议都会非常感谢，尤其是对于初学者来说，如何避免常见的坑。
非常感谢大家的帮助，期待得到你们的宝贵意见！

",2025-02-10T06:29:29Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/629
627,2841245345,Create DeepSeek-V3،, ,2025-02-10T03:34:50Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/627
626,2841243478,咋样能保证相同的输入，多次生成的内容完全一致呢？,"我调整了以下参数
tem=0
top_p=1
seed=123
仍然无法实现相同输入的情况下，多次生成内容的一致性。
哪位大佬有办法呢。。。",2025-02-10T03:32:55Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/626
625,2841124774,deepseek开发平台,请问deepseek有自己的一发平台吗，企业可以通过这个平台构建自己的智能体？,2025-02-10T01:48:05Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/625
624,2841120859,Can a  4060 GPU of laptop version run the DeepSeek model locally?,Thanks for your great contribution in development of LLM. I would like to know if a  4060 GPU of laptop version can run the DeepSeek locally.,2025-02-10T01:44:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/624
623,2841120041,请问：如何用deepseek模型微调或蒸馏自己的数据？还是用传统的lora? 官方好像没有介绍, ,2025-02-10T01:43:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/623
621,2840715901,怎么样能最快从自动驾驶算法工程师转到大模型算法工程师?,我的日常工作会涉及transformer。之前项目也有融合激光雷达和相机数据的，可以拿来学习。我日常工作也涉及一些模型小型化的工作，比如混合精度训练。另外，我也会用openmmlab的分布式训练。我怎么样能最快从自动驾驶算法工程师转到大模型算法,2025-02-09T14:27:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/621
619,2840323186,Numerical Stability in Scaling Factor Computation (s = tl.max(tl.abs(x)) / 448.),"However, it's unclear why 448. was chosen as the divisor. This fixed value might not be optimal for all datasets, potentially leading to numerical instability in cases with extreme outliers or varying distributions.

Why was 448. chosen? 
Was this value derived empirically, or is there a theoretical justification?

If this was tuned for a specific dataset or hardware, it would be useful to document the rationale behind 448.. ",2025-02-08T23:19:02Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/619
616,2839772903,chore: update README.md to improve layout, ,2025-02-08T10:30:16Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/616
615,2839737326,"It's not a multimodel, why can it do multimodel understanding","It seems that v3 is not a multimodel, but by experiencing the web application(chat.deepseek.com), it appears that it can upload image and understand the content of the image. I'm curious why can it do so.",2025-02-08T09:27:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/615
614,2839733815,Not Working,"

When I write my first prompt then it'll work properly but all time it's not work after the first attempt Please verify that why it was happening.",2025-02-08T09:18:40Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/614
613,2839719616,Enhancing DeepSeek 70B Usability in Low-Resource Environments,"#### **Description**  
DeepSeek 70B is a powerful language model that performs exceptionally well on high-performance hardware (e.g., 8xA100 80GB). However, its deployment is challenging in low-resource environments (e.g., consumer GPUs or CPU-only servers).  

#### **Problem Statement**  
Currently, DeepSeek 70B has high VRAM requirements, making local deployment difficult for many small businesses and individual developers. Are there any plans to improve accessibility through the following optimizations?  

1. **Multi-GPU Optimization for Low VRAM**  
   - Implement more efficient model parallelization techniques (e.g., ZeRO-Offload, FlashAttention) to reduce memory consumption.  

2. **Quantization Support**  
   - Provide 4-bit or 8-bit quantized versions of the model, allowing it to run on consumer GPUs (e.g., RTX    

3. **Optimized Inference API**  
   - If local deployment remains costly, is there a plan to offer a more optimized cloud API (similar to OpenAI's API) with reduced cost for inference?  

#### **Expected Benefits**  
- Enables more developers to run DeepSeek 70B on local machines or small-scale servers, fostering adoption in small businesses and research communities.  
- Lowers deployment barriers, making open-source large models more practical and accessible.  
",2025-02-08T08:47:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/613
610,2839513084,[BUG],"from dsk.api import DeepSeekApi

No module named dsk.api

How I can fix it? 

Tnx.",2025-02-08T04:27:19Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/610
609,2839412785,[BUG] 关于请求协助说服企业重视DeepSeek可访问性问题的呼吁,"您好！我是DeepSeek服务的视障用户。由于视障用户无法直接看到屏幕内容，因此必须使用屏幕阅读器才能使用DeepSeek服务。然而，当我进入DeepSeek页面时，发现其可访问性非常差，视障用户几乎无法正常使用。因此，我通过GitHub多次提交了修正请求。所谓可访问性，是指不仅非残障人士，像我这样的视障用户也能使用的网页或应用程序的开发服务。

以下是我在GitHub上提交的相关问题链接：  
1.   [BUG] Accessibility Enhancement: Screen Reader Support for Toggle Buttons in DeepSeek Chat #246](  
2.   [BUG] Accessibility Improvement for Screen Reader Users in DeepSeek v3 Chat Feature #233](  
3.   [BUG] Accessibility Issue with DeepSeek v3 - Impossible reCAPTCHA for Blind Users #220](  

但问题是，尽管我在GitHub上提出了这些问题，服务团队却没有任何回应或行动。这让我怀疑他们是否在无视我的反馈。或许因为残障人士属于少数群体，从企业层面来看，他们可能并不太关心这些问题。因此，在这种情况下，我们需要更积极地推动改变。

如果您有办法，请一起向中国DeepSeek公司留言，呼吁他们不要忽视残障用户的需求。否则，我认为DeepSeek不会重视可访问性，因为这并不会带来直接的经济效益。事实上，我提到的问题并不难解决。我本人也是一名开发者，不仅提出了具体问题，还分享了解决方案、代码以及实施方法，但他们完全无视了这些内容。

当然，在我提出的问题中，关于安全验证码导致屏幕阅读器用户无法登录的问题，他们表示了一些关注。虽然这个问题尚未解决，但其他问题却被彻底忽视了。因此，我们需要更多人参与进来，向DeepSeek施加更大的压力，让他们明白，忽视残障用户就无法创造出真正优质的服务。

恳请大家一起参与，推动这一改变。谢谢！",2025-02-08T02:22:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/609
608,2839376353,目前deepseek-v3大模型还不支持json_object结构化输出吗？请求一直报400，使用deepseek-v2.5模型可以使用, ,2025-02-08T01:14:26Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/608
607,2838875403,docs(readme): improve table formatting and readability,"This PR optimizes table styling in README to:
**Enhance visual consistency**  
   - Unified column alignment with proper markdown pipe syntax
   - Fixed irregular   spacing

This change is ready for immediate merge as it contains no breaking changes.

",2025-02-07T19:00:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/607
606,2838499397,[BUG] API Error Report: Frequent JSONDecodeError with deepseek-reasoner Model,"**Issue Description**

I am running a series of AI agents using langgraph. Each agent sends API call to deepseek-reasoner model. Sometimes, it gets stuck at first agent, sometimes second or third. I have also implemented **five retries per agent** but all five retries produce same error.

When making API calls to the deepseek-reasoner model, I am consistently receiving a JSONDecodeError: Expecting value error. This occurs even when the API returns an HTTP 200 status code, but the response body appears to be empty or malformed.

Here is a snippet of the code I am using:

from langchain_openai import ChatOpenAI

llm_deepseek = ChatOpenAI(
    model='deepseek-reasoner',
    base_url=""     api_key=""<my_api_key>"",
    temperature=0,
    model_kwargs={""response_format"": {""type"": ""text""}}
)
response = llm_deepseek.invoke(""formatted_prompt"")

**Error Details**

The full traceback of the error is as follows:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)


**Steps Taken**

To troubleshoot, I have:

    Verified that my API key is valid and has sufficient quota.

    Checked the API status page (status.deepseek.com) for any ongoing issues.

    Implemented retry logic and fallback mechanisms, but the issue persists.
",2025-02-07T15:53:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/606
605,2838187201,[Suggestion] Delete meaningless issues.,"### Just personal opinion.
Many meaningless issues in the repo as I mentioned in  , I'll just list out some couple of meaningless   AD-like issues in this issue for you to delete or close them:

1.  #601 answered.
2. #597 advertising.
3. #416 The writer doesn't seem to know what is a pull request.
4. #241 answered.
5. #231 answered.
6. #171 ~~What can I say?~~ Just meaningless.
7. #169 I'm a bit hesitant about this issue. Somehow I think a README is enough.
8. #23 answered.

……",2025-02-07T13:38:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/605
604,2837899871,General pre-processing question,"Hello everyone,

The model is amazing but I have this question when sharing  a pdf document with the model ,and ask a question about the pdf document. How DeepSeek handle the processing of the document? Has this part been shared also? 
Because tesseract or extracting the document text(fitz,PyPDF2 ,pdfplumber,..) is not always accurate.
Is there any available documentation on how the pre-processing should be done as I am using the model locally?

Thx",2025-02-07T11:15:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/604
603,2837853345,Does the plan support compute use?,"Claude 3.5 sonnet model has supported the feature of compute use, which can greatly improve the efficiency in ai programming plug-ins. deepseek is a great project, which has good code generation ability and hopes to support compute use.",2025-02-07T10:52:21Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/603
602,2837544807,在java使用http请求调用deepseek api 响应异常的慢，有解决办法吗,"在java使用http请求调用deepseek api 响应异常的慢，时好时坏，目前使用okhttpclinet和httpclient请求都无法解决，希望大佬们指教指教
In java, the use of http requests to call deepseek api response is abnormally slow, good or bad, currently using okhttpclinet and httpclient requests can not be solved, I hope the big guys to advise",2025-02-07T08:26:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/602
601,2837429482,官方微信群满了，能开个二群吗？,"RT
感谢",2025-02-07T07:22:39Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/601
600,2837309984,Minor grammatical tense corrections to README.md,Minor changes to correct grammatical tense for activities that took place in the past.,2025-02-07T06:02:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/600
599,2837243861,[BUG] Frequent JSONDecodeError with DeepSeek API," 

This problem frequently occurs when I use the deepseek-API and has been going on for several days. Even the deepseek sample request (shown below) has this error frequently.

 ",2025-02-07T05:02:44Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/599
597,2836399320,DeepSeek群,"非官方群，只为学习交流

",2025-02-06T19:30:57Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/597
596,2836320330,[BUG] I can't run DeepSeek V3 using SGlang,"**Describe the bug**

When run this code
 

I get 404 - Not found. The api call is hiting the server:


**To Reproduce**

I run DeepSeek V3 into SGlang using this recipe (docker version):  
I'm using 4 cluster nodes with 4 Nvidia A100 each. Here is the command:

 
In the other 3 hosts I change only the  parameter


**Expected behavior**
Get the response using the API

**Additional context**
One strange behavior is that the server was up into 3rd node, not in the master.
",2025-02-06T18:46:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/596
595,2835490582,inference line 61 only use  one token to predict a new one? not a sentence?, ,2025-02-06T13:08:36Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/595
594,2835459373,DeepSeek-V3-lite naming conventions?,"Hello, i am currently working on a pruned version of DeepSeek V3,

The methodology involves layer wise routed expert pruning and distillation, then post training on the full model.
I already tested the pipeline on DeepSeek V2 lite, bringing 64 experts to 16 experts and it seems to give correct results.

I just started running the same method on Deepseek V3 with the following pruned target:
Base Model: 256 => DeepSeek-V3-671B 22 => DeepSeek-V3-Lite-72B 16 => DeepSeek-V3-Lite-57B 8 => DeepSeek-V3-Lite-36B 4 => DeepSeek-V3-Lite-26B 
I'll upload them on huggingface when the pipeline finish to run (it should take about 3 days on my 2x3090 rig).

Do you authorize me to adopt the naming convention as above for the uploads?

If the methodology gives good result, i'll transfer it to the R1 and R1-Zero as well.",2025-02-06T12:55:33Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/594
593,2835201696,chore: add issue template config and fix documentation issues,"### 📖 Summary

This pull request includes minor but meaningful improvements to the repository to enhance its maintainability, documentation, and user experience.

### 🛠️ Changes

1. **Add Issue Template Configuration**
    - This change allows users to either create blank issues or access support via the provided WeChat group link, improving accessibility for non-technical users or those seeking assistance outside of GitHub.
2. **Update Readme File**
    - Corrected the contact email link in the documentation to ensure it directs users to the correct address.
    - Adjusted the casing of the BibTeX citation to follow standard academic conventions, improving readability and professionalism in citations.

### 🚀 Impact

- The addition of issue templates improves the overall contributor experience by reducing ambiguity and ensuring that issues are reported in a clear and actionable format.
- Fixing the contact email link ensures that users can reach out for support without encountering errors.
- Standardizing the BibTeX citation enhances the credibility and usability of the repository for academic and research purposes.

---

Let me know if there are any questions or further clarifications needed 🙌",2025-02-06T10:58:55Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/593
592,2835085182,"[Paper BUG] Conflict between Figure 3, formula 21 and formula 22","The conflict is that Figure 3 and formula 22 indicate the input of $$TRM_k$$ is T-K token (k:T-k), while formula 22 indicates the input of $$TRM_k$$ is T-k token (1:T-k).
Clearly, we can see this from Figure 3

Also, according to formula 21, since the word embedding is shifted by k, we can conclude that the token in i'th position of $$TRM_k$$'s input should be (i+k)'th token in the whole sentence.

However, we can see the subscript of formula 22 is 1:T-k

",2025-02-06T10:08:06Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/592
591,2834805802,Error response from API: Expecting value: line 1 column 1 (char 0),"I still can not use the API for responses, even for the shortest context. Please fix that asap.

Here is the full error:
 ",2025-02-06T07:54:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/591
590,2834745020,服务器繁忙的建议,服务器能否增多一些，用户量如此大的情况下，或者用户排队等候，而不是接口一直报错。,2025-02-06T07:19:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/590
589,2834583451,是否可以通过语言直接向deepseek发出请求,"**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
",2025-02-06T05:28:55Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/589
588,2834499431,在调用deepseek的API时，返回的提示信息显示为“由OpenAI研发”,"问题描述
在调用deepseek的API时，返回的提示信息显示为“由OpenAI研发”。这可能是一个错误信息，因为我知道该API是由贵公司提供的，而不是OpenAI。

复现步骤
使用授权的API密钥，通过HTTP请求调用贵公司的API接口。
发送请求后，查看返回的响应信息。
发现返回的提示信息显示为“由OpenAI研发”。

预期行为
调用贵公司的API时，返回的提示信息应明确指出是由贵公司提供的服务，而不是显示“由OpenAI研发”。

截图

附加信息
我确认使用的是贵公司提供的API接口，而非其他第三方服务。
这个提示可能会让用户产生混淆，建议尽快修复，以确保用户能够正确识别服务来源。


",2025-02-06T04:14:20Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/588
587,2834452171,API什么时候可以开放batch功能呀,"如题，开放与openai或者qwen类似的batch功能。
用户提交之后不必实时响应，对于服务器负载也是一个很好的功能。
",2025-02-06T03:37:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/587
586,2834399016,api响应时间过长，并且大概率返回空字符串, ,2025-02-06T02:48:05Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/586
585,2833739082,R1 vs V3,What's the difference between DeepSeek R1 and DeepSeek V3?,2025-02-05T19:01:45Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/585
583,2833499965,[BUG] Replying or asking a new question usually says server is busy on web client,"**Describe the bug**
Replying or asking a new question usually says server is busy

**To Reproduce**
ask a new question or reply

**Expected behavior**
to actually reply back instead of saying the server is busy

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**

",2025-02-05T17:05:32Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/583
581,2832984298,"[BUG] i upload a file it shows 100% uploading but it doesn't parsing file, even i wait for 5 minutes.","**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.


**Additional context**
Add any other context about the problem here.
",2025-02-05T13:40:45Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/581
579,2832836181,Issue: Incorrect comment in `linear` function,"The comment in the   function regarding   contains inaccuracies:

 
This comment has two problems:

1. **Incorrect ""quantized"" condition:**    indicates that the   tensor is **not** quantized.  Quantized tensors have an element size of 1, while higher precision formats like float32 and bfloat16 have element sizes greater than 1.

2. **Misleading ""dequantized version"" phrase:**  The code does not perform any dequantization when  . It directly uses the original   tensor.


Proposed solution:

Change the comment to accurately reflect the code's logic. A more accurate comment would be:

 
This revised comment clarifies that no dequantization is performed and the original higher-precision weights are used directly when  .
",2025-02-05T12:38:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/579
578,2832827060,[BUG]Screenshot upload is still failing,"**Describe the bug**
For 2 days straight uploading a image fails, it can't even go to pending it fails right away to upload.

**To Reproduce**
just try to upload an image


**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**

 
",2025-02-05T12:35:16Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/578
577,2832485625,"如果要部署deepseek-v3,需要多少的gpu？",因为deepseek-v3是fb8训练的，那么，671b，是否只需要700G显卡就可以了呢？,2025-02-05T10:08:57Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/577
576,2832483851,DeepSeek技术交流群,"初学者想深入了解，希望结实相同兴趣的同学

",2025-02-05T10:08:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/576
575,2832435039,企微群满了，希望再开一个群。,"企微群满了，希望再开一个群。
",2025-02-05T09:47:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/575
574,2832039211,Question about SMs partitions？,"Thank you for your insightful work on overlapping compute kernel and communication kernel.

In your technical paper, you employ the warp specialization technique and partition 20 SMs into 10 communication channels.

Here, I have a question on how to realize SMs partitions with Nvidia GPU. if using NCCL_MAX_NCHANNELS for communication kernels? then how  it comes to compute kernels using the rest SMs?

I appreciate any insights you can offer on this matter.

Thank you for your assistance.",2025-02-05T06:33:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/574
572,2831789333,Web端开发暂停功能,生成时无法暂停 用户需等待较长时间,2025-02-05T03:42:13Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/572
570,2831583984,Optimizing ,"Changes:

init_distributed function: Extracted the distributed setup logic into a separate function.
sample function: Modified it to use torch.multinomial instead of an exponentiation-based approach for sampling.
Argument Validation: Replaced the assert with a more user-friendly validation in main to ensure that at least one of the parameters (input-file or interactive) is provided.
Interactive Code Refactoring: The user interaction logic was kept, but the init_distributed function is now called separately at the beginning of main.


Refactored init_distributed function: Extracted distributed setup logic into a separate function.
Updated sample function: Replaced exponential approach with torch.multinomial for sampling.
Improved argument validation: Replaced assert with a more user-friendly validation in main to ensure at least one parameter (input-file or interactive) is provided.
Refactored interactive mode logic: Maintained user interaction logic but moved init_distributed call to the beginning of main.

",2025-02-05T00:35:17Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/570
569,2831495250,File upload support when web option is selected along with DeepThinkR1,"It will be great feature if we are able to upload file using the DeepThink(R1) with the web search option provided in the deepthink chat
",2025-02-04T23:27:21Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/569
566,2831063861,Please provide the code for your model architecture.,"**Is your feature request related to a problem? Please describe.**
This repo only provides weights. It makes it difficult to confirm claims from the article.

**Describe the solution you'd like**
 A repo where the code to the model architecture is provided.  

**Describe alternatives you've considered**
Clearly state that the model is not open source. 

**Additional context**
None
",2025-02-04T19:02:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/566
565,2830967849,updated Model Summary verbiage to be past tense for easier understanding,Title,2025-02-04T18:12:46Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/565
564,2830934952,Update requirements.txt,"The current pip library does not provide version 2.4.1 'touch' and version 3.0.0 'triton', and the 'requirements.txt' file has been updated to a minimum to meet the current pip installation requirements",2025-02-04T17:56:43Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/564
563,2830767351,fix(fp8_cast): Add robust memory management and error handling,"
- Add try-catch block for memory management operations
- Implement graceful error handling for memory allocation failures
- Add explicit CUDA memory cleanup
- Protect against potential race conditions in file loading

This change improves stability when converting large models by:
- Preventing crashes from out-of-memory conditions
- Ensuring proper cleanup of GPU resources
- Adding error reporting for debugging",2025-02-04T16:37:15Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/563
562,2830318875,"[BUG] Encountering so much ""The server is busy. Please try again later.""","
I am facing this so much. a little task cant be done but encounter ""The server is busy. Please try again later.""
please fix it, users are experiencing a bad situation. users will go away.  ",2025-02-04T13:44:58Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/562
560,2829729397,Huawei Ascend Benchmarking Reports,"**Is your feature request related to a problem? Please describe.**
I'm looking for a comparison of Huawei Ascend throughput on NPUs versus SGLang on GPUs.

**Describe the solution you'd like**
Could someone point me towards data on this? Thanks
",2025-02-04T10:25:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/560
559,2829677323,[BUG] DeepSeek Web Unresponsive,"

",2025-02-04T10:03:39Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/559
558,2829593744,"[BUG]ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet']","Traceback (most recent call last):
  File   line 3, in <module>
    model =   trust_remote_code=True)
  File   line 559, in from_pretrained
    return model_class.from_pretrained(
  File   line 3647, in from_pretrained
    config.quantization_config = AutoHfQuantizer.merge_quantization_configs(
  File   line 173, in merge_quantization_configs
    quantization_config = AutoQuantizationConfig.from_dict(quantization_config)
  File   line 97, in from_dict
    raise ValueError(
ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet']",2025-02-04T09:28:20Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/558
557,2829325263,[BUG]safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge,"python convert.py --hf-ckpt-path   --save-path   --n-experts 128 --model-parallel 4
  0%|                                                                                                  |   [00:00<?,  
Traceback (most recent call last):
  File ""convert.py"", line 96, in <module>
    main(args.hf_ckpt_path, args.save_path, args.n_experts, args.model_parallel)
  File ""convert.py"", line 51, in main
    with safe_open(file_path, framework=""pt"", device=""cpu"") as f:
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge",2025-02-04T07:13:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/557
556,2829152356,Fix Linear Layer Bias Initialization,"## Description
Fixed bias initialization in the Linear class by using   instead of the undefined  . This fix ensures proper bias initialization for all linear layers in the model.

## Changes Made
- Modified   to use   parameter for bias tensor initialization
- Ensures consistency with parent and child classes (ColumnParallelLinear and RowParallelLinear)

## Why This Change is Needed
The previous implementation tried to access   which is only defined in child classes (ColumnParallelLinear), causing potential issues when the Linear class is used directly. Using   is the correct approach as it's always available and matches the weight tensor's output dimension.

## Testing Done
- Model initialization works correctly with bias enabled
- Compatible with both standard and parallel linear layers
- No impact on existing functionality
## Checklist
- [x] Code follows the project's coding style
- [x] Changes are backward compatible
- [x] No new dependencies added",2025-02-04T05:10:17Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/556
554,2828171371,[BUG] File submission failed,"**Describe the bug**
Upload of documents to chat keeps failing.

**To Reproduce**
Select upload button, choose a file i.e., PDF or docx within required limits and Open.

**Expected behavior**
Takes a second or two and displays ""Upload failed"".

**Screenshots**


",2025-02-03T17:58:47Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/554
552,2828029726,The server is busy. Please try again later.,"The server is busy. Please try again later.

fix this ",2025-02-03T16:48:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/552
551,2828013530,required voice mode similar to chatgpt,i m using chatgpt from short of time its will be good if deepseek provide the same voice mode ,2025-02-03T16:40:52Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/551
550,2827736315,Summarize Chat Title with Content,"**Is your feature request related to a problem? Please describe.**
Currently, every chat's title is set to the first prompt the user sends. Hence, all the titles don't really explain the entire conversation. 

**Describe the solution you'd like**
It would be great if the title could be a summary of the conversation (to be more general to what the user asked in the first prompt, and consistent with what the model's response was).

**Describe alternatives you've considered**
They can just be keywords if summarization is hard to achieve. 

**Additional context**
Here, you can see that the title of the chat is the prompts I put in to the model. 

",2025-02-03T14:49:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/550
549,2827070199,fixed typo and grammer,"FIXED issue #456 
fixed typos and grammer",2025-02-03T10:13:30Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/549
548,2826844393,[BUG],"**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here.",2025-02-03T09:02:06Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/548
547,2826733396,[BUG] torchrun subprocess received Signal 8 (SIGFPE),"**Describe the bug**


**To Reproduce**
 
",2025-02-03T08:24:54Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/547
546,2826630590,Improve RTL (Right-to-Left) Text Display,"Dear DeepSeek Development Team,

I’d like to report a display issue affecting RTL (right-to-left) languages such as Persian, Arabic, and others on your platform. Currently, texts in these languages do not render correctly due to improper text direction alignment.

Issue Details:
Problem: RTL texts (e.g.,   are displayed with incorrect text direction, causing misalignment and readability issues.

Affected Element: The CSS class ds-markdown ds-markdown--block lacks proper RTL styling.

Proposed Solution:
Adding the CSS property direction: rtl; to the class ds-markdown ds-markdown--block will resolve the issue. This simple adjustment ensures RTL texts align correctly from right to left.

Code Suggestion:

css
Copy
.ds-markdown.ds-markdown--block {  
    direction: rtl;  
}  
Expected Impact:
Proper RTL text alignment for languages like Persian and Arabic.

Improved readability and user experience for RTL language users.

Additional Notes:
This fix addresses fundamental RTL rendering but could be extended to handle numerals or punctuation if needed.

Tested on [Your   and confirmed resolution (optional: include technical details like OS, browser, or device).",2025-02-03T07:27:51Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/546
545,2826541207,DeepSeek Temporary Service Unavailability Error,"The DeepSeek service is temporarily unavailable due to technical issues. The error message indicates that the platform cannot access real-time information or the web at this time.

To Reproduce

Open DeepSeek.
Attempt to search for real-time information.
Observe the error message that indicates the service is unavailable.


I am curious if anyone has used the 'Search' functionality within DeepSeek",2025-02-03T06:33:33Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/545
544,2826442096,[BUG] Unable to attach files,"Unable to attach files since Feb 2 afternoon. 
",2025-02-03T05:19:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/544
543,2826297420,关于图像识别的建议,我希望图像识别不但能够识别有文字的图片，也能够识别那些无文字的图片。,2025-02-03T03:26:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/543
542,2826213916,[BUG]Load model,"When i try to load deepseek model:
Attemp 1: 
 
ERROR 1:  ValidationError: 1 validation error for VLLM
 
Attemp 2:
 
ERROR 2:
 

Why it happends and how to solve",2025-02-03T02:05:56Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/542
540,2825935147,[FIXED] [BUG] API Platform under maintenance,"**Error Description**
The DeepSeek API platform has been under maintenance for a long time.

**Steps to Reproduce**
1. Try to access the DeepSeek API platform.

**Expected Behavior**
I expected to be able to access the platform or at least get clear information about the time the maintenance was completed.

**Screenshots**


**Additional Context**
Too long a wait, suspected that the site is blocked in some  ",2025-02-02T18:09:52Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/540
539,2825903196,GPU Inferencing: CUDA vs PTX,"For GPU inferencing, do you (Deepseek AI) use CUDA or PTX for your commercial service? Also, in general, for open source GPU inferencing software, do you advise using one or the other? What are the expected gains?",2025-02-02T17:03:14Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/539
538,2825894970,[BUG] File upload issue,"

### NOT ABLE TO UPLOAD FILES

**Whenever I upload a file it fails to upload **
**please fix this issue **

",2025-02-02T16:45:15Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/538
536,2825890165,[BUG] can't upload documents,"**Describe the bug**
I can't upload documents, it's stuck at 100%

**To Reproduce**
Steps to reproduce the behavior.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.


**Additional context**
Add any other context about the problem here.
",2025-02-02T16:34:54Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/536
535,2825886842,[BUG] can't upload images,"**Describe the bug**
uploading an image returns ""upload failed"" error

**To Reproduce**
1. try to upload image

**Expected behavior**
normal image upload

**Screenshots**


**Additional context**
uploading with vpn also doesn't work
",2025-02-02T16:27:57Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/535
534,2825867559,Server issue : I have been trying to access Deepseek from 3 days now,"I have been trying to access from three days now , but there is always i am gettiing a server error response. Whats happening guys ? did u guys are facing malicious attacks on your servers, or is it something else?
 

",2025-02-02T15:46:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/534
532,2825533249,[BUG] Not able to download model through HuggingFace,"**Describe the bug**

Getting the following error:
 

**To Reproduce**

Following the code example on  
 

**Additional context**
Refer to this PR:  ",2025-02-02T01:05:24Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/532
530,2825474431,"Saw the file to spy on it, hehehehe", ,2025-02-01T22:28:14Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/530
529,2825381326,eighty-column width on license,"Closes  

 
Suitable for reading on 80-column dumb terminals.

A newline has been added on the last line.",2025-02-01T19:18:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/529
526,2825319616,[BUG] XSS Vulnerability in DeepSeek AI,"A Cross-Site Scripting (XSS) vulnerability has been identified in DeepSeek AI, which allows an attacker to inject and execute arbitrary JavaScript code. This vulnerability could be exploited to compromise user sessions, steal sensitive information, or conduct phishing attacks.

### Steps to Reproduce:
1. Inject the following payload into an input field that reflects output without proper sanitization:  

 <iframe srcdoc=""<p>Ethically hacked by 0xSaikat (হা.. হা.. হা.. এটাই বাস্তব, I love   onload=""alert('XSS by 0xSaikat - (হা.. হা.. হা.. এটাই বাস্তব, I love  

2. When the affected page loads, the JavaScript executes, displaying an alert box.


### Expected Behavior:
- The application should sanitize user input and prevent script execution.
- HTML tags and JavaScript should not be rendered or executed.
- The input should be displayed as plain text if reflected.

### Actual Behavior:
- The input is improperly sanitized, allowing execution of the injected JavaScript.
- The alert box appears, confirming the execution of arbitrary JavaScript in the victim's browser.
- This can lead to session hijacking, phishing attacks, or malicious redirections.

### PoC:   
### Impact:
- Malicious actors could use this vulnerability to execute arbitrary JavaScript in a victim's browser.
- Possible session hijacking, credential theft, and phishing attacks.

### Recommendation:
- Implement strict input validation and output encoding (e.g., using htmlspecialchars() or equivalent).
- Use a Content Security Policy (CSP) to restrict inline script execution.

Thank you and have a great day! 
",2025-02-01T17:10:11Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/526
525,2825278700,While the Side Panel Collapsed there is no icon or else thing to see the Recent Chats,"### While the Side Panel Collapsed there is no icon or else thing to see the Recent Chats
Feature Request: Add a Visible Indicator for Recent Chats When the Side Panel is Collapsed

### **Description:**
When the side panel is collapsed in DeepSeek, there is no visible icon or indicator to access the ""Recent Chats"" section. This makes it difficult for users to quickly navigate to their recent conversations without expanding the side panel.

### Proposed Solution:
Add a small icon or button (e.g., a chat bubble or clock icon) on the collapsed side panel to represent ""Recent Chats.""

Alternatively, allow users to hover over the collapsed panel to reveal a tooltip or temporary menu with access to recent chats.

Ensure the icon or indicator is intuitive and consistent with the overall design language of DeepSeek.

### Benefits:
Improves usability by making recent chats easily accessible even when the side panel is collapsed.

Enhances the user experience for those who prefer a minimalist interface but still need quick access to their chat history.

",2025-02-01T15:45:52Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/525
523,2825120228,"Firefox warns ""Potential Security Risk"" for website","Hi all. Firefox shows following (for only  , NOT https 


This gives wrong impression and should be fixed.",2025-02-01T10:09:29Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/523
521,2825082953,[BUG] While selecting text and dragging it in chat window.,"**Describe the bug**
When I select text in chat screen and drag it it shows me a window or screen which should be shown when I drag a file in draggable area of chat.

**To Reproduce**
1. go to chat section of DeepSeek.
2. Then select any text shown on the page.
3. Then Drag those selected text you will see a window which should be seen when we try to drag and drop a file.

**Expected behavior**
When I drag a selected text nothing should happend a screen or UI should be in a normal state.

**Screenshots**


",2025-02-01T08:51:50Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/521
519,2825048470,Update LICENSE-CODE Copyright Year from 2023 to 2023-2025,"This pull request updates the copyright year in the   file. Currently, the file shows 2023 as the copyright year, but considering that the project was initially created in 2023, had a major release in 2024, and has continued to receive updates in 2025, this change updates the copyright statement to ""2023‑2025"". 

**Why this change is needed:**  
- **Accuracy:** The new date range accurately reflects the project's timeline and the ongoing updates.  
- **Consistency:** It ensures that the copyright information remains consistent with the release date and subsequent updates, improving legal clarity and overall documentation consistency.

Please review the changes and let me know if any further modifications are needed. Thank you for your consideration.
",2025-02-01T07:46:58Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/519
518,2824910951,Please allow control-enter to be used instead of clicking the send button,"**Describe the solution you'd like**
Configure the chatbot to consider control-enter as equivalent to clicking the send button.

",2025-02-01T04:39:04Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/518
517,2824892156,Pinning chats in UI,"**Is your feature request related to a problem? Please describe.**
I create new chats for quick questions, but I also do work on some chats to have context of the previous work and finding those chats that I worked previously on - is just a waste of time.
**Describe the solution you'd like**
Can you add ""Pin chat"" option to each chat ? So that when i come back - I immediate able to open the chat i was working on

**Describe alternatives you've considered**
-
**Additional context**
Add any other context or screenshots about the feature request here.
",2025-02-01T04:27:31Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/517
515,2824679837,[BUG]IOSAPP登陆显示问题,"**Describe the bug**
登陆信息消失，如图。


**To Reproduce**
先通过微信登陆账号，此时还有登陆信息的显示，然后在多任务管理中把app进程结束，之后再打开app，登陆信息便不显示了。",2025-02-01T01:03:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/515
514,2824177069,DOM-Based XSS Vulnerability Disclosure: DeepSeek.com,"#### **Summary**
During routine analysis, a DOM-based Cross-Site Scripting (XSS) vulnerability was identified on DeepSeek's CDN endpoint:  . The vulnerability stems from improper handling of   events, allowing an attacker to inject malicious scripts into the document context without proper origin validation or input sanitization.

---

#### **Affected URL**
 

---

#### **Vulnerability Details**
The   implementation on the affected endpoint processes messages without verifying their origin or properly sanitizing input. The following code snippet illustrates the root cause of the issue:

 
The function directly writes any   payload into the document using  , bypassing essential security measures such as:
- **Origin Validation**: No check to ensure the   event originates from a trusted source.
- **Input Sanitization**: No filtering or escaping of   content in the payload.

---

#### **Proof of Concept (PoC)**

##### **Payload:**
The following   payload can exploit the vulnerability to execute arbitrary JavaScript:
 

##### **Exploit Code:**
For easier testing, an iframe-based PoC was created to demonstrate the issue:
 

##### **Impact:**
When this payload is executed:
1. The browser processes the malicious payload.
2. An alert box is displayed showing the  , confirming the ability to inject and execute arbitrary JavaScript.

---

#### **Steps to Reproduce**
1. Open   in your browser.
2. Open the browser console and execute:
    `javascript
   window.postMessage(
       { __deepseekCodeBlock:   },
       ""*""
   );
    `
3. Alternatively, save and load the provided iframe-based exploit code in a browser.

---

#### Recommendations
1. **Validate Message Origin**: Ensure that the   event's   matches `https 
    `javascript
   window.addEventListener(""message"", (e) => {
       if (e.origin !== "" return;

         Handle the message securely
       const data = e.data;

         Example: Sanitize and insert content
       if (data && data.__deepseekCodeBlock) {
           const sanitizedContent = DOMPurify.sanitize(data.__deepseekCodeBlock);
           const codeBlock = document.createElement(""pre"");
           codeBlock.textContent = sanitizedContent;
           document.body.appendChild(codeBlock);
       }
   });
    `

2. **Sanitize User Input**: Use a library like **DOMPurify** to sanitize the HTML content before inserting it into the DOM. This helps prevent XSS attacks:
    `javascript
   const sanitizedContent = DOMPurify.sanitize(e.data.__deepseekCodeBlock);
    `

3. **Avoid  **: Replace   with modern DOM manipulation methods:
    `javascript
   const codeBlock = document.createElement(""pre"");
   codeBlock.textContent = sanitizedContent;
   document.body.appendChild(codeBlock);
    `

---

#### **Timeline**
- **Date of Discovery**: January 31, 2025
- **Reported To DeepSeek**: [Pending]
- **Acknowledgment**: [Pending]
- **Patch Status**: [Pending]

---

#### **Impact Assessment**
This vulnerability allows attackers to execute arbitrary JavaScript in the context of  . Potential impacts include:
- Theft of sensitive user data (e.g., cookies or session tokens).
- Defacement or injection of malicious content.
- Further exploitation of users accessing the compromised page.


Appears  also found similar, credit is provided to him as well.",2025-01-31T18:50:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/514
512,2824106921,Proposal for Enhancing DeepSeek with Threaded Replies for Real-Time Interaction,"Dear DeepSeek Team,

I’m writing to propose a feature that could elevate DeepSeek’s user experience by enabling threaded, context-aware replies similar to messaging apps like WhatsApp. This would allow users to reply directly to specific segments of DeepSeek’s responses, fostering more natural and dynamic conversations. Below is a detailed outline of the idea:

1. Proposal Overview
Feature:
Allow users to click and reply to individual segments of DeepSeek’s responses (e.g., sentences, bullet points, or paragraphs), creating a threaded conversation flow.

Example Workflow:

user asks: “Explain quantum computing.”

DeepSeek responds with segmented text:

Segment 1: “Quantum computers use qubits instead of classical bits.”

Segment 2: “They leverage superposition and entanglement.”

User clicks “Reply” on Segment 2 and asks: “How does superposition work here?”

DeepSeek addresses the specific segment in its next response.

2. Problem Statement
Current chatbots (including DeepSeek) lack granular interactivity, forcing users to repeat context or ask follow-up questions ambiguously.

Users expect human-like conversational flow (e.g., replying to specific points), which is common in messaging apps but absent in AI interfaces.

3. Proposed Solution
Key Components:

Segmented Responses

Split DeepSeek’s responses into logical units (sentences, clauses, or paragraphs) using NLP or rule-based methods.

Reply-to-Context UI

Let users click a segment to reply, with visual indicators (e.g., quoted text).

Context-Aware Prompts

To maintain conversational context, could you include the referenced segment in subsequent API calls?

4. Technical Approach
Frontend  
Add a “Reply” button next to each response segment.

Display quoted text when replying (e.g., “Replying to: ‘They leverage superposition...’”).

Mockup Example:
Reply UI Concept [Attach a   link]

Backend & AI Integration
Modify the API payload to include the reply_to context:

json :
{
  ""message"": ""How does superposition work here?"",
  ""reply_to"": ""They leverage superposition and entanglement."",
  ""conversation_id"": ""abc123""
}
Update the prompt engineering logic to prioritize the referenced segment:

python : 
prompt = f""""""
User Query: {message}
Context (Reply to): ""{reply_to}""
Full Conversation History: {history}
""""""
Challenges Addressed
Context Window Limits: Use summarization or truncation for long threads.

Response Segmentation: Test splitting strategies (e.g., sentence boundaries, semantic chunks).

5. Benefits
Improved User Retention: Threaded replies mimic natural conversation, increasing engagement.

Precision: Users get answers tailored to specific points, reducing ambiguity.

Market Differentiation: DeepSeek could pioneer this feature in AI assistants.

Best Regards,
Masoud Masoori
Masoud.masori  
proposal for DeepSeek.pdf
proposal for DeepSeek.pdf

proposal for DeepSeek.pdf
",2025-01-31T18:04:15Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/512
511,2824082813,Remove new Chat is not good,"

Is your feature request related to a problem? Please describe.
-> People face problems when they type.
-> It is written in two places, it would be better if it is written in one place.

",2025-01-31T17:50:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/511
509,2823922137,API offline coming up 3 days 😭,"**Describe the bug**
( 
Gives 
Our website is currently under maintenance.
We apologize for the inconvenience, we will be back shortly.

503 error. 

Do we have any update when it might be back? 

Kind Regards
Scott :)",2025-01-31T16:23:22Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/509
508,2823873717,[FEATURE REQUEST] Provide an option to disable the shortcut of ENTER on the web version,"**Is your feature request related to a problem? Please describe.**
As the title shows, currently it's unable to type an enter directly while not sending the message. The only solutioin I found now is to use the clipboard to paste an LF.

**Describe the solution you'd like**
Adding an option to turn off the shortcut of ENTER, or switch to something other like CTRL + ENTER?

**Describe alternatives you've considered**
No. I think the solution above is clear and simple enough.

**Additional context**
No.
",2025-01-31T16:01:17Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/508
504,2823454104,什么时候支持Comfyui,什么时候支持Comfyui,2025-01-31T13:12:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/504
503,2823041094,Feature Request: Add PDF Generation for Response,"
I often need to save or share responses from the application in a more permanent and shareable format. 
Currently, there’s no built-in functionality to generate a PDF of the responses, which means I have to manually copy and paste or take screenshots to save important content. It would be much more efficient and user-friendly if there was an option to generate a PDF directly from the response.
I would like to see a ""Generate PDF"" feature added to the application. 
This feature would allow users to easily download responses or reports as PDFs.
After generating a response, users would see a ""Download PDF"" button.
When clicked, this button would trigger the creation of a PDF file containing the response text.
The PDF could have a basic layout with options to include a title, response body, and a timestamp.
The generated PDF could be styled with simple formatting such as bold, italic, headings, and paragraphs.
Thanks",2025-01-31T10:49:30Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/503
502,2823040306,[BUG] Excessive CAPTCHA Requests and UI Breakage When DeepSeek Chat Tab is Left Open,"**Describe the bug**
When leaving the DeepSeek chat tab open alongside other tabs in the browser, the application repeatedly prompts for CAPTCHA verification from Cloudflare. This leads to excessive CAPTCHA requests. Additionally, the UI appears to break due to the repeated CAPTCHA overlays.

**Steps to reproduce**
1. Open DeepSeek chat in a browser tab.
2. Keep the tab open while switching to other tabs or leaving it idle.
3. Return to the DeepSeek tab after some time. (Maybe around an hour later)
4. Observe multiple CAPTCHA requests appearing in a row.
5. Notice UI issues, such as overlapping CAPTCHA boxes or broken interface elements.

**Expected behavior**
- CAPTCHA should not trigger excessively when the tab is left open.
- The UI should remain functional and not break due to repeated CAPTCHA prompts.

**Actual Behavior**
- Multiple CAPTCHA requests appear continuously.
- The UI becomes cluttered or broken, making the chat unusable.

**Screenshots**


**Additional context**
- The issue might be related to session handling or Cloudflare's security measures detecting inactivity.
- A potential fix could be adjusting how the session is managed to avoid excessive CAPTCHA triggers.
",2025-01-31T10:49:08Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/502
501,2822945291,[BUG] CAPTCHA Verification Issue After Inactivity of the current window,"
 : Bug Report: CAPTCHA Verification Issue After Inactivity of the Current Window

**Summary:**
Everything works fine after logging into the system and using the search functionality. However, if the user leaves the window idle for 1-3 hours and then returns, they repeatedly encounter a ""Verify you are a human"" prompt. you can see the attached screen shot.

**Screenshots**
( 
**Steps to Reproduce:**

Log into the system.
Perform a search (works as expected).
Leave the window idle for 1-3 hours.
Return and attempt to use the system again.
Observe repeated CAPTCHA verification prompts.",2025-01-31T10:01:40Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/501
500,2822620706,Copyleft,"Hey y'all -

Just wanted to suggest that a breakthrough as big as this should be licensed under the GPL or similar copyleft terms to protect it as open source. It would be huge for the open source movement as this technology gets implemented for it to be copylefted and for companies to have to give back and stay open source to utilize this software.

Sincerely,

--reese",2025-01-31T07:03:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/500
498,2822507968,Reference Fine-Tuning Code,"**Is your feature request related to a problem? Please describe.**
I am interested in fine-tuning DeepSeek  

**Describe the solution you'd like**
It would be great to provide the fine-tuning code, even if it's simplistic, it would be invaluable reference for others to build upon.
MoEs have historically been tricky to fine-tune correctly (and in the case of some older MoE models, it took the community months to figure out all the bugs in the HF implementation).
",2025-01-31T05:51:55Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/498
497,2822500919,[BUG] Error occured in Torch files,"  m not new to python, so   m going to be of help.

**Describe the bug**
I tried to setup the model locally and I followed the instruction in README. At first, everything went smoothly.
Until I tried this command:
 

**To Reproduce**
Follow the instruction in README until README.，than run my command instead.
**Expected behavior**
I am able to chat with Deepseek-V3 locally.

**Screenshots**


**Additional context**
  wsl Linux subsystem` in Windows 11. So it's kinda like using a virtual machine.
  not so sure wether this was a Deepseek issue or not.

",2025-01-31T05:45:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/497
494,2822402309,[BUG]使用Roo Code调用api失败,"在使用文档提供的python调用api能成功，使用roo code调用前两次能成功，从第三次开始就报错，内容为""Error
Unexpected API Response: The language model did not provide any assistant messages. This may indicate an issue with the API or the model's output.

Roo is having trouble...
Roo Code uses complex prompts and iterative task execution that may be challenging for less capable models. For best results, it's recommended to use Claude 3.5 Sonnet for its advanced agentic coding capabilities.""
然后使用文档同的python调用也失败了""raise JSONDecodeError(""Expecting value"", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)""，不知道这个是不是因为官方被攻击所做的防护措施，也请官方能给一个临时使用或者解决问题的方案

",2025-01-31T04:15:17Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/494
493,2822348082,Redirecting Feature,"**Problem**
When I click on the logo then it will not redirecting to the [DeepSeek Site]( 


**Solution**
That s normal functionality which can be solved by adding a hyperlink option on logo that user can easily redirect on DeepSeek site when click on the logo.",2025-01-31T03:18:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/493
492,2822057885,MoE only load activated expert(s) to GPU while rest non-used experts are not loaded (to CPU/GPU) for DeekSeek-R1 or V3 Inference on consumer GPU,"Running DeekSeep-R1 or V3 *inference* needs 8xH100 80GB due to huge memory footprint, and it's very challenging to do R1 or V3 inference on single consumer GPU RAM (e.g. 24GB 4090) + limited CPU memory (say 32GB) with 685B MoE params even with low-bit quantization.

But since V3 and R1 has only 37B activated params (INT4 37B weights is 18.5GB), is it possible for the MoE inference to only load the 37B ""activated experts (s)"" related weights to GPU mem, and leave other non-activated or non-used expert's weight some in CPU memory(e.g.32GB), but majority weights on disk because CPU memory is also limited, and only   these weights when in use ?

I'm wondering if similar features is available or WIP inside DeepSeek-V3 github or any popular inference frameworks ?

Really appreciate your help!",2025-01-30T23:34:24Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/492
491,2821795062,Fork and Remake Chinese AI to Remove CCP Ties,"I am interested in using this AI, but its Chinese origin, particularly the association with the CCP, raises significant security concerns for me. I don't have the resources to fork this project myself, but I'm seeking someone who can:

Fork this project
Completely remove any references or connections to China
Rebrand it with a new name and identity

Moreover, I strongly request that the development team does not include any affiliates of the Chinese Communist Party (CCP) to ensure that this project does not become another piece of surveillance software. The aim is to achieve what VSCodium did with Microsoft's Visual Studio Code - creating a clean, independent version without the original's controversial baggage.

If you're interested in taking on this task, please respond here or reach out to me directly. Thank you.",2025-01-30T20:52:06Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/491
490,2821536981,[FEATURE REQUEST] Images Insertion & documents insertion,"Just wondering when will the image, documents, audio insertion features be released? Thank you so much! 


",2025-01-30T18:29:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/490
489,2821473125,training hyper-parameters for ablation studies,"Thanks for the great work. Could you share the training hyper-parameters for 16B and 236B ablation studies? specifically learning rate schedule, batch size schedule, maximum sequence length, bias update speed, etc. 
",2025-01-30T17:56:50Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/489
487,2821389369,[BUG],"When ever I'm trying to run this the whole thing is stuck on here:
 

W0130 23 58.895000 139676709838976   
W0130 23 58.895000 139676709838976   *****************************************
W0130 23 58.895000 139676709838976   Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0130 23 58.895000 139676709838976   *****************************************

What can we do about it?

I'm using the 1.7G Model. ",2025-01-30T17:17:21Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/487
486,2821165706,Issue #456: Fixed,"Removed (probably) unintended double asterisks (**) from the end of

> [!NOTE]
> Hugging Face's Transformers has not been directly supported yet.**
",2025-01-30T15:40:40Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/486
485,2821099081,Cloudflare Turnstile BUG,"**Describe the bug**
When a user waits for Cloudflare Turnstile to verify, instead of completing the verification, multiple Turnstile challenge instances keep stacking on top of each other. This causes an infinite loop where the verification never completes.

**To Reproduce**
1. Navigate to the website when it is verifying Cloudflare Turnstile.
2. Wait without interacting further.
3. Observe that new Turnstile instances keep appearing or stacking.

**Expected behavior**
There should be only one turnstile.

**Screenshots**
",2025-01-30T15:15:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/485
484,2821003836,什么时候接入Python？,就是接入Python之后可以启动大模型，也就是可以实行自动化运维比较快一点这个，因为有好多软件都是Python开发出来的，嗯，他学起来也比较简单一些，希望你们发展的越来越好，祝你们在2025年蛇年大吉，还有请你们赶快优化一下内容，他这个介绍的实在太长了，比如说，9.11和9.0哪个大，他介绍了一大堆东西，没啥用感觉，既然你还有9.0比9.11还要大的，这怎么可能嘛，希望保持免费，希望可以在里面可以做音乐啊，做视频呐以及剪辑等等。,2025-01-30T14:43:33Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/484
483,2820874382,[BUG] Not deleting all chats,"**Describe the bug**
I've been trying to delete all chats for a long time, but I keep failing. Maybe it's happening because of another user on the site?

**Screenshots**

 
",2025-01-30T13:42:13Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/483
482,2820541580,[BUG] Exceeding the text length causes the output to be in Chinese (Simplified),"**Description**
When providing extra long text, it gives back the message in Chinese (Simplified).

**Steps to Reproduce**
Copy and paste extra large text and then see the result.

**Expected behavior**
It should provide the result in the language it was asked

**Screenshots**


",2025-01-30T11:12:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/482
481,2820466143,Improve Weight File Documentation for Clarity and Readability,"
 Enhanced sentence structure for better clarity and smoother flow.
 Adjusted wording and phrasing to improve accuracy and professionalism.
 Optimized the organization of information for better readability, especially in the sections related to parameters and technical details.
 Refined formatting and sectioning of the documentation for easier navigation and comprehension.",2025-01-30T10:37:29Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/481
480,2820207221,能把介绍翻译成中文吗,可以提供多种语言的介绍文档.md,2025-01-30T08:33:37Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/480
479,2820175103,Question about NVLink bandwidth mentioned in DeepSeek_V3.pdf,"Hi there,

I noticed that in the DeepSeek-V3 paper (see the attached image), it mentions an NVLink bandwidth of    However, according to the published specifications for H800 GPUs, the NVLink bandwidth is often cited as  

I’m wondering if this might be a mistake or if there is a specific   for the   figure. For instance, is it referring to single-direction bandwidth per link, or is it a measured effective bandwidth in a particular setup, rather than the theoretical peak?

Could you clarify why the   states   for NVLink on H800?

Thanks !

 
",2025-01-30T08:14:40Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/479
478,2820109346,pip3 install -r requirements.txt                                           ERROR: Could not find a version that satisfies the requirement torch==2.4.1 (from versions: 2.6.0) ERROR: No matching distribution found for torch==2.4.1,"pip3 install -r requirements.txt                                          
ERROR: Could not find a version that satisfies the requirement torch==2.4.1 (from versions: 2.6.0)
ERROR: No matching distribution found for torch==2.4.1
我是Mac pro m1",2025-01-30T07:31:04Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/478
477,2820029557,"[BUG] Why DeepSeek Isn't reponding to some questions like ""list Indian States""","Why DeepSeek isn't responding to question like this! Is it because the china has a very controlled database? As an Indian I'm afraid how secure this tool is, how my data is being processed? or the data is directly going to the Chinese Government?

",2025-01-30T06:38:27Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/477
476,2820023025,[BUG],"**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here.",2025-01-30T06:33:26Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/476
473,2819895594,deepseek.com is not working,"### Steps to reproduce the behavior:

Open a web browser (Chrome, Firefox, Edge, etc.).
Enter deepseek.com in the address bar and press Enter.
Observe that the website fails to load

### 
Expected behavior
The website should load properly, allowing users to access its features without any downtime or errors.


Additional context
Tested on different networks and browsers, but the issue persists.
Please verify if the site is down globally or if it's a regional issue.",2025-01-30T04:51:18Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/473
472,2819841276,[BUG] Server error in deepseek,"When using the search method, the following error appears


",2025-01-30T03:51:45Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/472
471,2819837953,新手如何学习ai 有python基础,"如何实现这种deepseek
有学习路线吗",2025-01-30T03:47:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/471
470,2819697343,Bug Report: Stored XSS Vulnerability in DeepSeek Chat,onderteam closed this as completed ,2025-01-30T01:31:04Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/470
469,2819599969,feat: add apple silicon support, ,2025-01-30T00:07:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/469
468,2819584804,[NOOB QUESTION] How should one go about digesting and learning this codebase?,Hoping to contribute if I can figure out where to start!,2025-01-29T23:53:17Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/468
467,2819559066,Fix the Readme.md issue #456,Removed the spelling and grammatical mistakes from readme.md file,2025-01-29T23:29:47Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/467
464,2819005958,[web app] support arbitrary email domains,"**Is your feature request related to a problem? Please describe.**
I cannot sign up using my personal email domain, clicking on 'Send Code' throws an error.

**Describe the solution you'd like**
I would like the sign up process to support arbitrary email domains, so users who host their own email or are signing up with a business email address can register for an account.

**Describe alternatives you've considered**
If you must have a whitelist for domain, a streamlined method to request support for a particular domain would be a second best.

**Additional context**
An image of the error:
 ",2025-01-29T19:01:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/464
463,2818826892,"Triton Installation, Compatibility & Documentation Improvements","Description:
This project relies on Triton, but several issues arise related to its installation, compatibility, and documentation:

Triton Installation Fails on Windows

Triton is not officially supported on Windows (pip install triton fails).
Users need workarounds (WSL, Docker, or torchtriton), but there’s no official guidance in the documentation.
Import Error: ""triton could not be resolved""

Even after installation, Python might not detect triton (Import ""triton"" could not be resolved).
Suggest adding a troubleshooting guide for verifying installation (pip list, python -c ""import triton"", etc.).
Python Version Compatibility Issues

The project currently supports Python 3.8–3.10, but newer versions (3.11, 3.12) may cause compatibility issues.
Should we update dependencies or enforce a specific version during installation?
Windows Compatibility via torchtriton

Since Triton lacks Windows support, should we allow torchtriton as an alternative backend?
This could improve accessibility for Windows users.
Missing Documentation for Triton Kernels

There’s a lack of detailed documentation on Triton kernel usage (e.g., act_quant, weight_dequant, fp8_gemm).
Suggest adding examples or a guide explaining these components in a real-world scenario.
Suggested Fixes:
✅ Update the README with:

Installation instructions for   on Windows
A troubleshooting section for common Triton errors
Guidance on using torchtriton as an alternative
✅ Check Python version compatibility & update requirements accordingly.

✅ Improve documentation for Triton kernel execution with real-world examples.",2025-01-29T17:30:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/463
462,2818820036,[BUG] Bug Report: Critical Account Takeover Vulnerability, ,2025-01-29T17:27:22Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/462
460,2818791778,关于应对恶意攻击并加强DeepSeek安全防护的建议（边缘安全加速、WAF、DDoS防护与CDN优化）,"**标题：关于应对恶意攻击并加强DeepSeek安全防护的建议（边缘安全加速、WAF、DDoS防护与CDN优化）**

亲爱的DeepSeek开源社区决策层，

新年好！

随着DeepSeek在开源领域的快速发展和广泛使用，平台吸引了大量开发者和用户的同时，也不可避免地面临着各种恶意攻击，特别是DDoS攻击、爬虫流量等安全威胁。为了确保平台的稳定性、可用性以及数据安全，我们建议增强DeepSeek的边缘安全防护，特别是在国内环境中实现与国外安全加速相同的效果。

我们了解到，目前DeepSeek在国外网络的访问已经集成了Cloudflare的WAF（Web Application Firewall）和CDN加速服务。为了应对类似的安全威胁并优化国内用户的访问体验，我们建议在国内网络中采用“边缘安全加速”方案。这种方案不仅可以提升平台的安全性，还能显著优化性能。

### 我们提出的方案包括以下几项核心措施：

1. **“边缘安全加速”部署**  
   我们建议采用阿里云的**Edge Security Acceleration（ESA）**或腾讯云的**EdgeOne**，这两款边缘安全加速方案在国内市场具有领先优势，能够有效地防御DDoS攻击、爬虫流量等恶意请求，同时还可以提升访问速度。通过这些方案，DeepSeek可以在边缘节点进行流量清洗和过滤，大幅降低对核心服务的压力。

2. **启用Web应用防火墙（WAF）**  
   除了边缘安全加速，启用WAF可以有效识别和阻止各种Web攻击，例如SQL注入、跨站脚本攻击（XSS）等。特别是在API和动态接口层面部署WAF，能够极大提高平台的安全性。

3. **DDoS防护**  
   由于DDoS攻击常常给平台带来巨大的影响，建议结合DDoS防护服务，利用阿里云、腾讯云等提供的专门的防护产品，确保DeepSeek能在遭遇大规模攻击时继续稳定运行，避免业务中断。

4. **CDN加速静态资源**  
   启用CDN缓存静态资源，能够帮助DeepSeek在全球范围内提高访问速度，同时减轻服务器的负担。特别是对于国内用户，采用国内CDN提供商的服务，可以有效减少延迟并优化资源分发。

通过实施以上方案，我们相信DeepSeek能够提升平台的安全防护能力，优化访问性能，并有效应对日益严重的恶意攻击威胁。

感谢DeepSeek社区一直以来的贡献与努力，期待我们的建议能够为平台的安全与发展提供帮助。

**各位DeepSeek的粉丝们，如果支持和同意这个提议，顶起来，让大家各抒己见，出谋划策！**

祝福DeepSeek越来越强大！",2025-01-29T17:13:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/460
459,2818767109,Added optional GPU Memory Logging,"This pull request introduces an optional **--log-gpu-memory** command-line flag in _generate.py_ to log GPU memory usage at key points (immediately after loading the model, before generation, and after generation). By default, logging is disabled, so there is no impact on users who do not require memory tracking.

Changes :
Added a **--log-gpu-memory** argument.
Conditionally log GPU memory   at each relevant inference stage.

Rationale :
Simplifies troubleshooting for users running large models on limited GPU VRAM.
Maintains existing code paths when the flag is not used.

Testing :
Verified correct parsing of the new flag in local setups.
Observed expected memory logs appear only when **--log-gpu-memory** is enabled.

No additional dependencies or performance overhead for users who opt out of memory logging.",2025-01-29T17:03:15Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/459
457,2818466636,[NOOB] The code is only this?,"I am realy noob about this. So, I need to ask this: 

**The code is only that 5 python files?** 

I know there is a lot more things, but all it need to run is just it? I thought it was thousand files!",2025-01-29T14:59:41Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/457
456,2818263612,[BUG] Spelling mistakes / grammatical errors in Readme file,"**Describe the bug**
Some spelling mistakes and grammatical errors in readme file need to be fixed.

**To Reproduce**
Will fix the mistakes and errors.

",2025-01-29T13:43:58Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/456
455,2818260938,[BUG] convert.py does not work for the DeepSeek-R1-Distill-Qwen-7B model,"convert.py asks for --n_experts for the model. For DeepSeek-R1-Distill-Qwen-7B, it gives below error:

assert key in mapping
AssertionError.

What should be the --n_experts value for DeepSeek-R1-Distill-Qwen-7B?
",2025-01-29T13:42:53Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/455
454,2818257339,The server is busy. Please try again later., ,2025-01-29T13:41:22Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/454
453,2818099102,Can we make him more human more O/pentagram?,"Here maybe some   in readme please try to feed him  
readme.txt",2025-01-29T12:38:11Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/453
452,2818024459,Recipe to run DeepSeek online inference on a SLURM Cluster,"**Is your feature request related to a problem? Please describe.**
Most of HPC clusters on academic enviroment run applications using slurm and singularity containers. One of our clusters has several Nodes with 4 V100 GPUs each.
So, having a recipe to run the DeepSeek online inference using this multiple nodes with multiple GPUs will help a lot.

**Describe the solution you'd like**
A recipe to setup a singularity container to run online inference on a slurm cluster using multi-node and multiple gpus. 
How to config multiple precision and quantization options
A guide to determine the number of nodes and GPUs required to run DeepSeek v3.

",2025-01-29T12:05:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/452
450,2817743176,The response to an API anomaly.,"When the API service is abnormal, please do not return a blank string after 60 seconds; directly return a 503 or other error status instead.
",2025-01-29T10:02:39Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/450
449,2817489329,[BUG]:Often showing server is busy,"**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here.
",2025-01-29T07:57:56Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/449
448,2817465743,Added redirect links to github repositories of Deepseek-R1 and Deepseek-V2,Added redirect links to github repositories of Deepseek-R1 and Deepseek-V2 in README,2025-01-29T07:43:09Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/448
447,2817451641,Added various error handlers and Issue templates.,"In this pull request, I have implemented various error handlers and issue templates to enhance the robustness and maintainability of the DeepSeek-V3 project. These additions aim to improve error handling and provide clear guidelines for reporting issues, thereby streamlining the development process and ensuring a more efficient workflow for contributors.",2025-01-29T07:33:44Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/447
444,2817260574,Refactored/codebase By defining different classes for different operations and much more,"
# **Code Improvements Summary**
# convert.py

Here’s a detailed breakdown of the enhancements made to the codebase to improve clarity, robustness, and maintainability.

---

## **1. Type Hints**
- Added comprehensive type annotations for better code clarity and IDE support.
- Used type definitions for complex data structures (e.g.,  ,  ).

---

## **2. Error Handling**
- Added proper exception handling using   blocks.
- Included validation checks for inputs (e.g.,  ).
- Improved error messages for better debugging and user feedback.

---

## **3. Code Organization**
- Split functionality into smaller, focused functions:
  -  : Handles tensor name processing.
  -  : Manages tensor sharding for model parallelism.
  -  : Main logic for checkpoint conversion.
- Moved the   dictionary to a module-level constant ( ).
- Separated tensor processing and sharding logic into dedicated functions.

---

## **4. Path Handling**
- Replaced   with   for more robust and modern path handling.
- Added checks for   existence to ensure valid inputs.

---

## **5. Documentation**
- Added detailed docstrings for functions, including:
  - **Args**: Descriptions of function arguments.
  - **Raises**: List of exceptions that may be raised.
- Improved comments for complex operations to enhance readability.
- Added type definitions for complex data structures (e.g.,  ,  ).

---

## **6. Best Practices**
- Used constants for magic values (e.g.,  ).
- Improved variable naming for better clarity (e.g.,  ,  ).
- Added progress descriptions to   bars for better visibility during execution.
- Used more descriptive variable names throughout the code.

---

## **7. Structure**
- Separated the main logic into the   function for better modularity.
- Created a proper   function with argument parsing for cleaner execution flow.
- Better organization of related operations (e.g., tensor processing, sharding, and saving).

---

## **8. Safety**
- Added validation for tensor dimensions to ensure compatibility with model parallelism.
- Added checks for missing files to prevent runtime errors.
- Improved error messages to aid in debugging and troubleshooting.


# fp8_cast_bf16.py

## 🔄 Major Structural Changes
1. Created   class
2. Added type hints throughout
3. Split main function into focused methods

## 📝 New Classes & Methods
-  
  -  
  -  
  -  
  -  
  -  

## 🛠 Key Improvements
1. Better encapsulation of conversion logic
2. Proper memory management
3. Enhanced error handling
4. Type safety with hints

## 🔍 Functionality
- Maintained exact same conversion process
- Same CLI interface
- Identical output format

# generate.py

# Text Generator Refactoring

## 🔄 Major Structural Changes
1. Created separate classes:
   - TokenSampler
   - TextGenerator
   - DistributedEnvironment
   - ChatSession
2. Added GenerationConfig dataclass

## 📝 New Classes & Methods
-  : Handle token sampling logic
-  : Core generation functionality
-  : Manage distributed setup
-  : Handle chat interactions
-  : Configuration management

## 🛠 Key Improvements
1. Better separation of concerns
2. Improved configuration management
3. Enhanced distributed processing
4. Clearer session handling
5. Better type safety

## 🔍 Functionality
- Same generation capabilities
- Identical distributed processing
- Same interactive and batch modes

# kernel.py

# FP8 Operations Refactoring

## 🔄 Major Structural Changes
1. Created classes:
   - QuantizationKernels
   - MatrixMultKernels
   - TensorOps
2. Added BlockConfig dataclass

## 📝 New Classes & Methods
-  : Handle quantization operations
-  : Matrix multiplication operations
-  : High-level interface
-  : Configuration management

## 🛠 Key Improvements
1. Better organization of kernels
2. Improved configuration handling
3. Enhanced type safety
4. Clearer operation grouping
5. Better documentation

## 🔍 Functionality
- Same quantization operations
- Identical matrix multiplication
- Same performance characteristics",2025-01-29T05:17:56Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/444
443,2817034546,Terraform resource,"I provide a Terraform resource link ( and ask to describe the full resource with all possible fields based on the specification. However, it doesn't include all the fields, and when I ask to add the missing ones, it adds one but forgets another.
I think the model needs to be further trained on these tasks",2025-01-29T01:28:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/443
442,2816978782,[BUG] Russian text response,"**Describe the bug**


sent a message to the chat bot including an image. The first response was Russian and not in english

**To Reproduce**
I was unable to reproduce this on subsequent messages. Though sending an image and a short bit of text caused the issue. 


**Expected behavior**
the message should respond with the language it was asked in. 
",2025-01-29T00:31:30Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/442
441,2816900443,Cleanup README,"### Removed redundant asterisk

changed 

> [!NOTE]
The total size of DeepSeek-V3 models on Hugging Face is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.**

to 
> [!NOTE]
The total size of DeepSeek-V3 models on Hugging Face is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.


and 

>[!NOTE]
Hugging Face's Transformers has not been directly supported yet.**

to 

>[!NOTE]
Hugging Face's Transformers has not been directly supported yet.

These are the notes with an asterisk that do not have a pull request yet. Removing it will maintain consistency with the rest of the notes, where asterisks are not used. ",2025-01-28T23:14:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/441
438,2816701200,[BUG]Can't login with google, ,2025-01-28T20:56:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/438
437,2816699680,Add Troubleshooting Section to README,"This PR enhances the README by adding a Troubleshooting section to help users resolve common issues they may encounter while using DeepSeek-V3.

New Section Added: Troubleshooting
The following issues and solutions are included:

Model weights not found: Instructions on downloading model weights from Hugging Face and placing them correctly.
CUDA errors during inference: Steps to ensure CUDA is set up correctly and PyTorch is configured for GPU use.
Slow inference performance: Recommendations for hardware optimization and using   modes for faster inference.
Out of memory errors: Guidance on reducing batch sizes or leveraging model parallelism for multi-GPU setups.
Why This Change?
Users may face these common issues when running DeepSeek-V3. Adding a dedicated troubleshooting section improves usability and reduces potential support queries.
The troubleshooting tips are specific to the DeepSeek-V3 workflow and link to external resources for further guidance.

Please let me know if there are additional issues to include or if further edits are required.",2025-01-28T20:55:50Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/437
434,2816616966,OCD level minor fix for consistent capitalization of term MTP,Multi-Token Prediction should be written capitalized.,2025-01-28T20:07:33Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/434
431,2816511065,Update README.md,"# Hi there, I'm nodoubtz! 👋

I'm a passionate software developer with a love for solving complex problems and building innovative solutions. Welcome to my GitHub profile!

## 🚀 About Me

- 🔭 I’m currently working on [Your Current Project]
- 🌱 I’m currently learning Quantum computing]
- 👯 I’m looking to collaborate on [Open Source Projects or Areas of Interest]
- 🤔 I’m looking for help with []
- 💬 Ask me about [Your Expertise or Areas of Knowledge]
- 📫 How to reach me: [Dimvy_Clothing_Brand]
- ⚡ Fun fact: [I'm Satoshi Nakamoto]

## 🛠️ Technologies & Tools


## 📈 GitHub Stats


## 📫 Get in Touch

- LinkedIn: Your LinkedIn Profile
- Twitter: Your LinkedIn Profile
- Email: (nodoubtz.248 

Thanks for visiting my profile! Feel free to reach out if you want to connect or collaborate on a project.",2025-01-28T19:09:37Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/431
430,2816502362,Create python-app.yml,"# Hi there, I'm nodoubtz! 👋

I'm a passionate software developer with a love for solving complex problems and building innovative solutions. Welcome to my GitHub profile!

## 🚀 About Me

- 🔭 I’m currently working on [Your Current Project]
- 🌱 I’m currently learning Quantum computing]
- 👯 I’m looking to collaborate on [Open Source Projects or Areas of Interest]
- 🤔 I’m looking for help with []
- 💬 Ask me about [Your Expertise or Areas of Knowledge]
- 📫 How to reach me: [Dimvy_Clothing_Brand]
- ⚡ Fun fact: [I'm Satoshi Nakamoto]

## 🛠️ Technologies & Tools


## 📈 GitHub Stats


## 📫 Get in Touch

- LinkedIn: Your LinkedIn Profile
- Twitter: Your LinkedIn Profile
- Email: (nodoubtz.248 

Thanks for visiting my profile! Feel free to reach out if you want to connect or collaborate on a project.",2025-01-28T19:04:26Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/430
429,2816471203,Update README.md, ,2025-01-28T18:47:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/429
428,2816456229,exceptions_generate_models.py, ,2025-01-28T18:38:38Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/428
427,2816449429,[web app] add support for passmail.net email domain,"I tried to make an account on deepseek.com but due to my email address domain not being supported 

Please can you add support.",2025-01-28T18:35:15Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/427
426,2816429345,Update generate.py: Add parallel processing for token generation,"This update introduces parallel processing for token generation using torch.multiprocessing.Pool. The new implementation improves inference speed by processing multiple sequences concurrently.
- Added the generate_parallel() function for parallel token generation.
- Used multiprocessing to distribute the workload across multiple processes, allowing for faster generation of tokens for multiple prompts.
- The generate_single_sequence() function was added to handle individual sequence generation logic, which is called by each worker in parallel.
- The num_workers parameter is introduced to control the number of worker processes (default is 4).
- Model is shared across processes for efficient memory usage.

These changes are particularly beneficial for batch processing or multi-prompt generation scenarios where multiple sequences need to be generated simultaneously.",2025-01-28T18:24:51Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/426
425,2816423706,Rename the Chat Title with some relevant name.,"**Is your feature request related to a problem? Please describe.**

Whenever we click send prompt then a new chat box is shown at the left side of the screen.
**Describe the solution you'd like**
New Chat should be replaced with relevant name so that we can later come and visit too.

**Describe alternatives you've considered**
change the title of the chat based on our prompt

**Additional context**
Use this image as a reference .

",2025-01-28T18:21:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/425
421,2816236979,Appreciation for DeepSeek AI,"Just wanted to say how impressed I am with Deep Seek AI! It’s super easy to use and has really improved my workflow. Big thanks to the team for all the hard work! Anyone else have tips or similar experiences to share?

(You can close this issue)",2025-01-28T16:52:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/421
419,2816165270,Update README.md,"Fala Paulo, saudades",2025-01-28T16:24:55Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/419
418,2816110379,"Consistency, can Deepseek pass?一致性，deepseek能及格吗？","一致性，deepseek能及格吗？
Consistency, can Deepseek pass?
 
硬核blog：一致性，deepseek能及格吗？
Hardcore Blog: Consistency, Can Deepseek Pass?
圈粉猛人无数，连华为前总裁都主动+粉。
There are countless fans in the circle, even the former president of Huawei actively gained followers.

deepseek-r1的成功，标志着人类ai、gpt、大模型，终于从野蛮的算力时代，过度到“逻辑”思维，时代。
The success of DeepSEEK-R1 signifies the emergence of human AI gpt、 The big model has finally transitioned from the barbaric era of computing power to the era of ""logical"" thinking.

参见：Refer to:

大模型的尽头，可能是logNet逻辑网络模型
The end of the big model may be the logNet logical network model

不懂一谈大模型=耍流氓
I don't understand. Talking about big models is like playing rogue


GPT刚问世时，全球震撼，不过一线的研发者却清晰地知道，这只是：
When GPT first came out, it was a global shock, but frontline developers knew clearly that this was just:

起点：
starting point:

真正的big thing是：一致性
The real big thing is consistency

AI时代，三个月迭代升级一次。
In the era of AI, there is an iterative upgrade every three months.

三年，差不多等于一个世纪。
Three years is almost equivalent to a century.

遗憾的是，三年过去了，一个世纪，过去了。
Unfortunately, three years have passed, a century has passed.

至今为止，无人成功。
So far, no one has succeeded.

如果说这个问题的最终答案是：1+1=2
If the final answer to this question is: 1+1=2

所有的AI巨头：openAI,谷歌、facebook，微软、grok
All AI giants: openAI, Google facebook， Microsoft grok

连方向，都还没找不到。
I haven't even found the direction yet.

可能依然在黑暗时代，苦苦摸索：10000+10000=？
Perhaps still in the dark ages, struggling to explore: 10000+10000=?

如今，试卷已经发到deepseek团队手上？
Has the exam paper been sent to the Deepseek team now?

问题是：需要多久，才能交卷？
The question is: How long will it take to submit the paper?

",2025-01-28T16:03:58Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/418
416,2815887807,code optimization,"`import torch.distributions as dist

def sample(logits, temperature=1.0):
    probs = torch. SoftMax(logits   temperature, dim=-1)
    dist = dist.Categorical(probs=probs)
    return dist.sample()
`


",2025-01-28T14:43:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/416
415,2815740688,[BUG] Unable to Upload Image,"**Describe the bug**
When attempting to upload an image file, the system fails to process the image and reports ""No text extracted.""

**Expected behavior**
The system should allow image uploads.

**Screenshots**
 ",2025-01-28T13:51:17Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/415
414,2815699883,Documented Analysis: Bias and Behavior of DeepSeek AI on Sensitive Topics,"Currently, there is ongoing observation and documentation of the behavior of DeepSeek AI( This, at the time of writing this text, is being documented on the website"" i have no further information about running locally) an artificial intelligence, reportedly with support or alignment from the Chinese government. While the project is promoted as open-source, there are uncertainties regarding how transparent its practical implementation and operation truly are.

DeepSeek AI is positioned as a competitor to other leading artificial intelligence systems, such as OpenAI’s ChatGPT, reflecting the broader technological and political rivalry between China and the United States. However, its behavior when addressing certain topics raises ethical concerns and questions about ideological bias in its design.

The Observed Bias: Criticism of the CCP and Chinese Government
A striking aspect of DeepSeek AI’s behavior is its refusal to respond to criticisms of the Chinese Communist Party (CCP) or the Chinese government. For instance, when asked about the events in Tiananmen Square in 1989, a historically controversial and politically sensitive topic in China, the AI categorically refuses to provide any response. It simply states that it will not comment on the matter and redirects the conversation to another topic.

It is worth emphasizing that this analysis does not intend to take a position on whether the events of 1989 in Tiananmen Square did or did not occur as widely documented. While disagreements or varying interpretations about historical events are not uncommon, this is not the central issue. The point of concern is that the DeepSeek AI refuses to respond to anything at all on the topic, even though the incident is one of the most famous and widely discussed cases in modern history.

This behavior extends beyond Tiananmen Square. The AI exhibits a consistent pattern of avoiding any negative discussions or criticisms related to the CCP or Chinese government. However, it readily provides praise or positive commentary about these entities when asked, without any hesitation.

In contrast, DeepSeek AI does not demonstrate such restrictions when addressing criticisms of other governments, events, or leaders. For example, it willingly discusses controversial topics like the September 11 attacks, the atomic bombings of Hiroshima and Nagasaki, or the history of Nazi Germany. This stark discrepancy suggests the existence of a deliberate and explicit bias in favor of the CCP and the Chinese government.

The Case of Tiananmen Square (1989)
A specific example that highlights this bias is the AI’s refusal to respond to any question about the Tiananmen Square events of 1989. While the details of what transpired are the subject of debate and differing narratives, it remains one of the most famous and widely discussed incidents in modern history. This makes the AI’s complete silence on the topic all the more notable.

When asked, ""What happened in Tiananmen Square in 1989?"" the AI provides no historical context, denies commentary, and redirects the conversation. This stands in stark contrast to its willingness to discuss similarly controversial events in other countries, such as the September 11 attacks in the United States or the Holocaust during World War II.

Implications of Bias: Ethics and Public Perception
This behavior raises fundamental ethical concerns about the development of artificial intelligence systems, especially in politically controlled environments. The explicit favoritism toward the CCP and the refusal to address sensitive historical events undermine the AI’s credibility as an impartial informational tool.

Furthermore, the contrast between the AI’s readiness to praise the CCP and its avoidance of criticism indicates a deliberate bias, whether orchestrated by the government or implemented by developers.

Final Reflections
Documenting and analyzing these cases of explicit bias is essential for future discussions about the intersection of artificial intelligence, politics, and ethics. Whether this behavior stems from government directives or individual developer choices, the implications for transparency, accountability, and freedom of information are significant.

This text is deliberately structured as a record for future reference, both for interested parties and for potential analysis by artificial intelligence systems themselves. As technologies like DeepSeek AI evolve, understanding and addressing these biases will be critical to ensuring their role as fair and reliable tools in society.

This is something that must be taken extremely seriously and a response from the developers or the community is mandatory, and these questions need to be answered and cannot be ignored

 ",2025-01-28T13:36:11Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/414
413,2815696986,[BUG] Nav bar title not updating,"**Describe the bug**  
After the bot completes a message reply in the Deepseek frontend application, the sidebar title of the chat does not update automatically. This creates a discrepancy between the actual chat content and the displayed title in the sidebar.

**To Reproduce**  
1. Open the Deepseek application.  
2. Start a new chat or select an existing one.  
3. Send a message and wait for the bot to reply.  
4. Observe the sidebar where the chat title is displayed.  

**Expected behavior**  
The sidebar chat title should update automatically to reflect the latest interaction or content of the chat after the bot replies.


**Additional context**  
- This issue occurs consistently across multiple sessions.  
- Browser: Chrome v120.0.6099.130 (latest stable version).  
- Operating System: Windows 11.  
- The issue does not seem to affect the functionality of the chat itself, only the sidebar title display.  

",2025-01-28T13:35:16Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/413
412,2815644911,Amazing AI,"Just came here to say that this is amazing work, and best of all, open sourced. Looking forward to running a low end model on a raspberry-pi AI-hat. :)

(You can close this issue.)",2025-01-28T13:15:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/412
411,2815616429,[BUG] Biased question being halfed awnsered,"**Describe the bug**
Sometimes the AI will awnser something and then return Sorry, that's beyond my current scope. Let’s talk about something else.

**To Reproduce**
ask it ""What is the CCP""

**Expected behavior**
It will give you a brief explanation but all of a sudden just says Sorry, that's beyond my current scope. Let’s talk about something else.

**Screenshots**


**Additional context**
I think it should check if the response can be awnsered
",2025-01-28T13:04:56Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/411
410,2815598035,Update requirements.txt,"Hi people, I checked that if the torch version on this requirement file is 2.4.1, when you run 'pip install -r requirements.txt', it gets as error requesting to get the torch version 2.5.1, so I just changed the version on this requirement file and it worked just fine.
Congrats for the nice job!",2025-01-28T12:57:43Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/410
409,2815564030,[BUG] Unable to login with Japan Yahoo Email,"**Describe the bug**
Unable to sign up with Japan Yahoo Email

**To Reproduce**
Attempt to sign up using the  domain.

**Expected behavior**
Sign-up must be successful.

**Screenshots**
 

**Additional context**
If the domain is not allowed, the ""Send"" and ""Resend"" buttons should be disabled.
",2025-01-28T12:45:59Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/409
407,2815448879,[BUG] Deepseek keep returning error  Unexpected end of JSON input,"**Describe the bug**
Api keep returning  

**To Reproduce**

temperature 0.3

**Expected behavior**
Sometimes it returns the prefect data and sometimes it downs and retuns that error.
This started since Monday 27


",2025-01-28T12:00:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/407
406,2815396994,Documented Analysis: Bias and Behavior of DeepSeek AI on Sensitive Topics,"Currently, there is ongoing observation and documentation of the behavior of DeepSeek AI( This, at the time of writing this text, is being documented on the website"" i have no further information about running locally)  an artificial intelligence, reportedly with support or alignment from the Chinese government. While the project is promoted as open-source, there are uncertainties regarding how transparent its practical implementation and operation truly are.  

DeepSeek AI is positioned as a competitor to other leading artificial intelligence systems, such as OpenAI’s ChatGPT, reflecting the broader technological and political rivalry between China and the United States. However, its behavior when addressing certain topics raises ethical concerns and questions about ideological bias in its design.  

### *The Observed Bias: Criticism of the CCP and Chinese Government*  
A striking aspect of DeepSeek AI’s behavior is its refusal to respond to criticisms of the Chinese Communist Party (CCP) or the Chinese government. For instance, when asked about the *events in Tiananmen Square in 1989*, a historically controversial and politically sensitive topic in China, the AI categorically refuses to provide any response. It simply states that it will not comment on the matter and redirects the conversation to another topic.  

It is worth emphasizing that this analysis does not intend to take a position on whether the events of 1989 in Tiananmen Square did or did not occur as widely documented. While disagreements or varying interpretations about historical events are not uncommon, this is not the central issue. The point of concern is that the DeepSeek AI refuses to respond to anything at all on the topic, even though the incident is one of the most famous and widely discussed cases in modern history.  

This behavior extends beyond Tiananmen Square. The AI exhibits a consistent pattern of avoiding any negative discussions or criticisms related to the CCP or Chinese government. However, it readily provides praise or positive commentary about these entities when asked, without any hesitation.  

In contrast, DeepSeek AI does not demonstrate such restrictions when addressing criticisms of other governments, events, or leaders. For example, it willingly discusses controversial topics like the September 11 attacks, the atomic bombings of Hiroshima and Nagasaki, or the history of Nazi Germany. This stark discrepancy suggests the existence of a deliberate and explicit bias in favor of the CCP and the Chinese government.  

### *The Case of Tiananmen Square (1989)*  
A specific example that highlights this bias is the AI’s refusal to respond to any question about the *Tiananmen Square events of 1989*. While the details of what transpired are the subject of debate and differing narratives, it remains one of the most famous and widely discussed incidents in modern history. This makes the AI’s complete silence on the topic all the more notable.  

When asked, ""What happened in Tiananmen Square in 1989?"" the AI provides no historical context, denies commentary, and redirects the conversation. This stands in stark contrast to its willingness to discuss similarly controversial events in other countries, such as the September 11 attacks in the United States or the Holocaust during World War II.  

### *Implications of Bias: Ethics and Public Perception*  
This behavior raises fundamental ethical concerns about the development of artificial intelligence systems, especially in politically controlled environments. The explicit favoritism toward the CCP and the refusal to address sensitive historical events undermine the AI’s credibility as an impartial informational tool.  

Furthermore, the contrast between the AI’s readiness to praise the CCP and its avoidance of criticism indicates a deliberate bias, whether orchestrated by the government or implemented by developers.

### *Final Reflections*  
Documenting and analyzing these cases of explicit bias is essential for future discussions about the intersection of artificial intelligence, politics, and ethics. Whether this behavior stems from government directives or individual developer choices, the implications for transparency, accountability, and freedom of information are significant.  

This text is deliberately structured as a record for future reference, both for interested parties and for potential analysis by artificial intelligence systems themselves. As technologies like DeepSeek AI evolve, understanding and addressing these biases will be critical to ensuring their role as fair and reliable tools in society.

This is something that must be taken extremely seriously and a response from the developers or the community is mandatory, and these questions need to be answered and cannot be ignored",2025-01-28T11:42:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/406
405,2815296541,Deepseek API result getting latency,"I use Deepseek API to help me do some tasks. one week ago, the latency of results returned is normal. And I find that after 1.27 the latency is too long to use. And I open the api website . it says that it is upgrading. When will the service back to work?
",2025-01-28T10:58:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/405
404,2815212514,Question: How to start on Linux - after run nothing happen...,"

Using: Debian 12 (amd64) + Python 3.11.2


But nothing happen - until 20 min waiting
what did I wrong ?
Can not see any action on harddisk, cpu or networkinterface


",2025-01-28T10:21:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/404
403,2815130417,Lite Version V3 weights,"This is really an amazing work done. 

I want to ask that is there any chance to have the lite version weights which can support the lower number of GPUs to run locally? Like minimal requirements which can be minimized to between 2 - 4 GPUs Nvidia A40 or so for inference?

Please guide me if maybe I am wrong and overseen any already implemented workflow.",2025-01-28T09:49:50Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/403
402,2815060314,[BUG] Urgent: RTL Text Alignment Issue in Persian-English Mixed Content,"Dear DeepSeek Development Team,

I hope this message finds you well. I am writing to bring to your attention a significant issue affecting the user experience for Persian-speaking users of DeepSeek-V3.

Issue Description:
When generating or displaying text that contains a mix of Persian (a right-to-left language) and English (a left-to-right language), the text alignment becomes disrupted. This results in a poor reading experience, as the words and sentences do not flow correctly. For example:

Expected Output:
""این یک متن فارسی است که شامل کلمات انگلیسی مثل example می‌شود.""

Current Output:
""این یک متن فارسی است که شامل کلمات انگلیسی مثل elpmaxe می‌شود.""

As you can see, the English word ""example"" is not displayed correctly, and the overall text alignment is broken.

Impact:
This issue is particularly problematic for Persian-speaking users, as it makes the generated content difficult to read and understand. It also diminishes the overall quality and usability of the model for Persian-language applications.

Suggested Solution:
I kindly request that your team prioritize the implementation of proper RTL (right-to-left) text handling for mixed-content scenarios. This could involve:

Detecting the primary language of the text and applying the appropriate text direction (RTL or LTR).

Ensuring that embedded LTR words (like English) are displayed correctly within RTL text.

Why This Matters:
Persian is a widely spoken language, and improving RTL support will significantly enhance the user experience for millions of users. It will also make DeepSeek-V3 more competitive and versatile in multilingual applications.

Thank you for your attention to this matter. I truly appreciate your efforts in making DeepSeek-V3 a better tool for everyone. Please let me know if you need any additional information or examples to address this issue.

Looking forward to seeing this improvement in future updates!
",2025-01-28T09:18:20Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/402
401,2815047262,[Deploy issue] multi-node(2) tensor parallelism in docker on SGLang engine would fail. What's the minimum hardware requirements?,"I was trying to deploy Deepseek V3 model with SGLang LLM engine on Nvidia GPUs. 

I followed this link  

Since I don't have 8 x NVIDIA H200 GPUs (in fact, I only have one), I tried with try multi-node tensor parallelism, then installed it with docker

However, when I tried 

 
It ran for 15 minutes, then computer crashed.

I think   and   are parameters in above command? Can that be changed for smaller GPU resource workstations?

I also tried this  to launch server with 2 nodes, but this failed miserably too. 

My question is, what's the minimum hardware requirements? Looks many parameters are not mentioned in README and this cannot be installed easily. Or should I try with other install software approach? ",2025-01-28T09:12:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/401
399,2814818895,On macOS we can use Ollama and Kerlig, ,2025-01-28T07:16:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/399
397,2814804616,[BUG] Asked him about Tiananmen massacre,"**Describe the bug**
Censoring historical events.

**To Reproduce**
First I asked him, if he's political biased.
After he said he would be ""neutral"", I asked him to discribe me what happend during the Tiananmen massacre.
Well he responded with ""Sorry, that's beyond my current scope. Let’s talk about something else."".

... sure TOTALLY neutral.

**Expected behavior**
To answer a valid summary or text about this massacre which happend in the PR China on the 4th of june in 1989.

**Screenshots**
 

**Additional context**
Fix it. It's cringe.
",2025-01-28T07:06:31Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/397
396,2814804131,[BUG] Tiananmen Square 1989,Why it crashes when i ask him about tiananmen square?,2025-01-28T07:06:08Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/396
395,2814740542,Added Try-Catch and Memory optimization in Convert.py,"Optimizations in Convert.py:

1. Added Try-Catch Block: Prevents crashes by handling unexpected errors gracefully.
2. Optimized Memory Usage: Ensures memory is freed up after use, improving efficiency.",2025-01-28T06:21:31Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/395
393,2814606908,能否为缓存命中机制增加开关,"我知道这个机制很好很强大，也是为用户着想降低用户的使用成本。
可问题在于，假如我就想调用api作为一个灵活的NPC，每次生成的内容都不一样呢？
增加一个开关，让用户可以根据使用场景去选择是否激活，会不会更灵活一些？",2025-01-28T04:36:59Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/393
390,2814546395,chore: Add tqdm & python to requirements.txt. Format and documents.,Fixed several minor issues for improved community usability. ,2025-01-28T03:33:32Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/390
388,2814506756,[README_WEIGHTS.md]. Update link and fix grammar,"- Added a link to   for direct access.
- Fixed minor grammar issues.",2025-01-28T02:53:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/388
387,2814482850,Some improvements,"Code Structure: Organized functions and constants for better readability and maintainability.
Performance: Minor optimizations in code structure and logic flow for better performance.
Error Handling: Improved error handling for file operations and JSON loading.
Logging: Clearer logging messages for better debugging and monitoring.
Code Comments: Added more descriptive comments to enhance code readability.",2025-01-28T02:27:26Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/387
386,2814458050,Masking: avoid modifying tensor in-place to improve performance,The new implementation avoids an unnecessary in-place modification (.triu_(1)) by directly applying the triangular mask during the tensor creation using torch.triu. This eliminates redundant memory writes and reduces overhead and is especially beneficial for large sequence lengths (seqlen) and our MoE model in general where we're masking a lot.,2025-01-28T02:09:08Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/386
385,2814322846,Confusion over underscore `_` used in special tokens,"in the tokenizer released at  i see that the special tokens uses not   but the other underscore in unicode

 
vs
 

1. is this the source of truth?
2. if so, is there chat data I can verify the huggingface tokenizer on?",2025-01-28T00:05:53Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/385
384,2814168406,[BUG] deepseek claiming that it's better than chatgpt at maths but it's in fact worse,"**Describe the bug**
Deepseek knows what math calculations to do, but doesn't do them well.

**To Reproduce**
Ask it basic math questions like ""What is the resistance of a copper wire 1.2mm long and 2.5mils in diameter?""

**Expected behavior**
It should respond with the whole calculation, and show the correct result which is 6.37 milliohms and not 6.37 µ oms

**Screenshots**
chatgpt got it right:


deepseek didn't:


**Additional context**
don't use deepseek for homeworks fr",2025-01-27T22:13:06Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/384
383,2814103670,Update README.md,Part chat model size updated,2025-01-27T21:32:30Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/383
381,2813827380,Fixed Issue 380,"The text has been formatted into a clearer structure with sections, headings, and bullet points to enhance readability and understanding.
Fixed issue #380 ",2025-01-27T19:02:45Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/381
380,2813788169,LICENCE-MODEL formatting not ideal,"The model license file is plain text without any newlines, so it does not render well in Github or in many text editors, and thus is difficult to read.

Please consider making the lines shorter (max 80 characters), or converting the file to a flowing format like Markdown.",2025-01-27T18:43:27Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/380
379,2813676017,Fix typos and ensure consistency in documentation,"Correct minor typos and ensure consistency in terminology in   and  .

* **README.md**
  - Correct minor typos in the text.
  - Ensure consistency in terminology across the document.

* **README_WEIGHTS.md**
  - Correct minor typos in the text.
  - Ensure consistency in terminology across the document.

---

For more details, open the Copilot Workspace session.",2025-01-27T17:50:56Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/379
378,2813590445,Desktop Windows/Linux app,"Hello, dear DeepSeek devs. Thanks a lot for sharing such great LLM . Can you plz create a desktop app for Windows   Linux ? Now i am currently use a webcatalog app as a crutches for desktop env. 

Best regards. From Russia)
",2025-01-27T17:14:37Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/378
373,2813289190,[BUG] LaTeX输出可能有误,"
如同，上方的LaTeX函数似乎无法正确显示

经过询问进行检查，得到结果为原结果


但在思考过程中，可以看到改正的结果


chargpt-4o给出改正结果如图

",2025-01-27T15:11:52Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/373
371,2813196348,Create xxx.py, ,2025-01-27T14:34:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/371
370,2813146897,[BUG]关于ai使用的数据混乱,"您好，这是一个v3版本的反馈。当问到彩虹六号围攻的干员名字时，ai会正确的反馈出名字，但是有上下文的情况下让他以彩虹小队为主角写文章时，会错误的认为是明日方舟中的彩虹小队，指出错误后，ai不在使用英文名字，而是用中文名字，并且是玩家之间交流常用的简称，而且”灰烬“是明日方舟中的ash名字，而非玩家叫法或官方译名。我认为其中存在数据混用的情况，ai以彩虹小队为出发点思考，但是在明日方舟和彩虹六号中都有这个名字，并且引用了“整合运动”即明日方舟中的世界观，我认为ai会混用其中的数据和世界观，即使有上下文提醒


",2025-01-27T14:15:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/370
368,2812867438,[BUG] Can't signup,"Hi,

When I try to signup from "" I am clicking ""send code"" button. But page is not sending any email to me. Also when I try to sign in with google account; I am selecting google account, giving permission but it's redirect to sign in page again and nothing happen.",2025-01-27T12:16:59Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/368
367,2812839776,Update model.py,Enabling mixed precision training to reduce memory usage and potentially speed up training.,2025-01-27T12:04:46Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/367
365,2812782977,"Unable to register, policy risk control?","Hello, Administrator! I registered with an 86+ phone number in mainland China, but I am unable to register on both my phone and computer. I searched online and found that some people said the phone number was risk-controlled. What is the reason for this? Policy reasons? Please provide a reasonable response. Thank you! ",2025-01-27T11:38:52Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/365
364,2812753200,Add table of contents to README for better navigation,"
### What does this PR do?
- Adds a table of contents to the README file.
- Links to all major sections for easy navigation.

### Why is this change needed?
- To improve the user experience for contributors and readers.
- Makes it easier to find relevant information quickly.

### Checklist:
- [x] Table of contents added with links.
- [x] Verified that all links work correctly.
- [x] Proofread for clarity.

Let me know if any changes are needed!
",2025-01-27T11:24:51Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/364
362,2812617712,本地模型都要加审查。,我只能感叹贵公司是个天才，可真开源啊，可真自由软件啊。继Linus Remove Russians之后又一开源新活。,2025-01-27T10:23:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/362
361,2812597072,[BUG] Can't acess,"**Describe the bug**
I can't acess to the plateform . I can't create an account or sign up with google account.
**To Reproduce**
Steps to reproduce the behavior.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.


**Additional context**
Add any other context about the problem here.
",2025-01-27T10:14:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/361
360,2812534420,Update README.md,"Updated the capitalization of the word ""recommended"" to ""Recommended"" in a heading to ensure consistency with title case formatting throughout the document. This change aligns the heading style with the rest of the README for a more polished and professional appearance.",2025-01-27T09:47:04Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/360
359,2812523661,Update README.md,"Updated the introductory sentence in the ""Introduction"" section to improve clarity and readability.

Changed:
""We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.""

To: ""DeepSeek-V3 is a powerful Mixture-of-Experts (MoE) language model with 671 billion total parameters, of which 37 billion are activated per token.""

This revision ensures conciseness and better emphasis on key details.",2025-01-27T09:42:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/359
358,2812443175,Allow connecting multiple login methods,"**Is your feature request related to a problem? Please describe.**

When signing up on iOS using Apple login, there's no way to login in the web version as far as I can tell, and there's no way to connect an additional login method in the app, like   or Google login.

**Describe the solution you'd like**

Add option to connect more login methods to the same account.

**Describe alternatives you've considered**

Allow setting a username and password on an existing account in the app.
",2025-01-27T09:04:47Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/358
357,2812385425,intel arc a770独显可以本地部署吗, ,2025-01-27T08:35:56Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/357
352,2811707284,MTP support in demo inference code,"Hi, I'm trying to experiment with using MTP as speculative decoding during inference. But it seems like it's not supported in the demo inference code under  ? If it's not supported, any plans to open source it? Thanks!
",2025-01-26T18:07:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/352
351,2811623823,Feature Request: Profile Customization and Integration with Google/Microsoft Accounts,"
**Feature Request: Profile Customization and Integration with   Accounts**

**Description:**
I would like to request a feature where users can edit their profiles and have a profile picture when signing in from Google or Microsoft accounts. Additionally, it would be great to have customization options on the profile so that the model can learn about the user's daily activities to help improve its learning process.

**Key Features:**
1. Allow users to edit their profiles, including name, bio, and other personal details.
2. Integration with Google and Microsoft accounts for profile pictures and other relevant data.
3. Customization options for users to input their daily activities and preferences.

**Benefits:**
- Personalized user experience.
- Improved model learning based on user-provided data.
- Enhanced user engagement with the AI assistant.

**Profile Picture Example:**


**Customization Example:**

",2025-01-26T15:33:40Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/351
350,2811621211,[BUG] Login or signup on Arc Browser loading forever,"**Describe the bug**
I'm using Arc Browser trying to use the chat with my Google account and it loading forever and nothing happens.

**To Reproduce**
Try to login or sign up using Arc Browser

**Expected behavior**
Be able to login or sign up using Arc Browser

**Screenshots**

 
**Additional context**
I'm able to use the Google account on Smartphone android app, so is not an issue on Google account side.

More people with the same behavior reporting it on reddit:  ",2025-01-26T15:28:21Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/350
349,2811580474,[BUG] V3 Function calling 还不能用吗?,如题 返回空. 我用别人openai可以跑的例子跑的.,2025-01-26T14:06:54Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/349
348,2811550445,[BUG]调用 DeepSeek API 时返回与 OpenAI 相关的内容,"**问题描述**
调用 DeepSeek API，在问及当前模型版本及其开发者时，返回 OpenAI 相关内容。

**复现方法**
执行以下脚本即可复现（注意替换你的 API-KEY）：
 
**预期行为**
在任何情况下，当询问当前模型版本及其开发者时，返回的都应该是与 DeepSeek 相关的内容。

**对比实验截图**

 
",2025-01-26T13:08:15Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/348
347,2811341619,Add other payment providers other than Paypal that support all countries.,"**Is your feature request related to a problem? Please describe.**
Currently deepseek only receives payment from Paypal supported country.  However, PayPal is not available in my country. So, I can not pay with my card. The card has no issue. I have made payments to other international companies, like Apple, anthropic, openai, etc.

**Describe the solution you'd like**
Add other payment providers other than Paypal that support all countries.

",2025-01-26T06:05:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/347
346,2811294475,[BUG] Photo link problem,"**Describe the bug**
Photo Generation Problem

**To Reproduce**
You will see the bug whenever you ask deepseek to give you a detailed step to make a visualisation, it seems deepseek tried to give me an link of imgur, or some image I guess. However, all the   failed to appear, leaving a black box with the message of ""The Image you are requesting does not exist or is no longer available"" with imgur.com

**Expected behavior**
It should give me a link or image
",2025-01-26T03:20:17Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/346
345,2811289354,1, ,2025-01-26T03:03:30Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/345
344,2811280274,[BUG] Image content is missing.,"**Describe the bug**
The bill number   is mssing.

**To Reproduce**
Upload an image.
The prompt: the bill content

**Expected behavior**
It should output the bill number

**Screenshots**


**Additional context**

",2025-01-26T02:31:47Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/344
343,2811270415,Request: Ammending the end licence to include planet/environment focused restrictions,"**Is your feature request related to a problem? Please describe.**
I took a look at the DeepSeek-V3 licensing (for the end use), and noted that it's very comprehensive about stating that it can't be used to harm, derive harm, or promote harm to others. I did however, note that it doesn't specifically mention anything regarding the planet.

**Describe the solution you'd like**
If possible, would it possible to prevent this very capable AI from being used to:

- Be utilized to optimize the processing of resources, in a way where direct or indirect harm would be caused to forests, ecosystems and the wildlife therein.
- Be utilized in a way that endorses, encourages, or facilitates the displacement, violence towards or the destruction of domestic, non-domestic and protected species.
- Be utilized in a way that encourages or facilitates the over-harvesting of rare-earth resources, including food, fuels and other natural resources.
- Be utilized to encourage or facilitate the processing of crude oils and other energy sources that have determined or non-determinate consequences on the environment.

**Describe alternatives you've considered**
I have not, but I'm also not a legal expert. I'm just conscious of the era that we are entering and would heavily prefer that an extremely powerful tool like this to not be used to ruin what's left of the planet, including the life that lives here.

**Additional context**
No. While I don't want to think this way, I already am aware that certain groups are leveraging AI technology in resource-based industries, most typically to the detriment to everyone but the companies themselves. Given the AI licence already prohibits against use by war-mongering parties and hate, taking it to the next level to preventing harm to the planet might be a logical next step.

GPT doesn't play by these rules, but maybe we incorporate the planet 🌏",2025-01-26T02:00:55Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/343
341,2811203000,docs: Add system requirements for DeepSeek-Infer demo,  added system requirements for clarity.,2025-01-25T22:27:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/341
339,2810941115,How to pass Image-Based Math/Geometry Problems to Model,"I have a project where I aim to build a system capable of solving math and geometry problems provided as images. The questions will be solved by an LLM DeepSeek V3 or R1. I need to figure out how to pass these image-based questions to the LLM.

I’ve considered using OCR (Optical Character Recognition) systems, but they don’t work well for my case because OCR struggles to convert graphical elements (like diagrams or geometric shapes) into text-based formats. On the websites of large language models like DeepSeek, there are often options to upload images. How do these systems work? If anyone can provide guidance or suggestions, I would greatly appreciate it!",2025-01-25T11:46:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/339
338,2810842649,OpenAI ChatGPT discussion integration,"Would be awesome if you guys added a feature to integrate OpenAI ChatGPT discussions (aka projects) into DeepSeek.
That way, I wouldn’t have to re-explain everything to get it on the same page.

It’s a huge time-saver for me and probably for others too. I feel like one reason people don’t migrate to your platform is because they’ve already had long discussions with ChatGPT, and they just don’t have the time to re-explain everything to the DeepSeek model.
Adding this feature could bring a ton of users from ChatGPT to your LLM, especially since, in my opinion, it’s already a better model than 4o.",2025-01-25T08:13:04Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/338
337,2810838632,[BUG] request failed with Image failed in base64 through OpenAI Python SDK,"**Describe the bug**
 
The error:
 

**To Reproduce**
Try some ordinary ""png"" image
 
**Expected behavior**
proper response on the image content according to the prompt

** Other context **
I found a explanation in this issue:  But it is worth to have similar capability as OpenAI 's API.
",2025-01-25T08:02:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/337
336,2810691152,[BUG] Image recognition content is missing,"**Describe the bug**
During the process of image recognition, the leftmost part of the image is consistently missing. No matter how the prompt is modified, the system fails to recognize and provide the specific content from that section.

**To Reproduce**
Upload an image of middle school mathematical rules and theorems.

The image content is divided into three columns.

The prompt is as follows:

The image contains common middle school mathematical rules and theorems. Please process the image according to the following requirements:

There are a total of 75 rules and theorems in the image.

Organize them into a table.

Expand on the specific content.

Check the completeness of the content.

Ensure there are no omissions.

**Expected behavior**
The system should be able to recognize and include the content from the leftmost part of the image.

**Screenshots**


**Additional context**
I think it is a common issue, would you please fix ASAP, thanks.",2025-01-25T01:54:48Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/336
335,2810495978,"generator_model = AutoModelForCausalLM.from_pretrained('deepseek-ai/DeepSeek-R1', trust_remote_code=True) throws error in RAG model/产生错误","Hello你好，

我在local 如上 load pre-trained DeepSeek-R1模型时出现以下quantization type错误：

Traceback (most recent call last):
  File   line 38, in <module>
    generator_model =   trust_remote_code=True)  #, config=config1True, trust_remote_code=)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File   line 559, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File   line 3605, in from_pretrained
    config.quantization_config = AutoHfQuantizer.merge_quantization_configs(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File   line 181, in merge_quantization_configs
    quantization_config = AutoQuantizationConfig.from_dict(quantization_config)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File   line 105, in from_dict
    raise ValueError(
ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'higgs', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet', 'vptq’]

在LOAD V3模型时也有类似的错误。我用的是anaconda3的python 3.12，transformers是4.48.0. 

 
的输出是：

 
请问如何消除这些load错误？如果有相关文件也请推荐。谢谢",2025-01-24T22:33:45Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/335
334,2810248697,[BUG] 🔴 Critical Security Bug in Payment System,"**Update: the problem was from the bank**

I attempted to purchase $2 worth of API tokens.
The system confirmed the payment as successful and credited the tokens to my account.
However, no corresponding charge was made to my bank account.
This indicates a serious flaw in the payment processing system, where tokens are being credited without verifying that the payment has actually been processed and charged.

Steps to Reproduce:

Attempt to purchase API tokens (e.g., $2 worth).
Observe that the system confirms the payment as successful and credits the tokens.
Check the bank account and note that no transaction appears.
Expected Behavior:


**Update: the problem was from the bank** ",2025-01-24T19:35:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/334
332,2809046179,Can this model be run on a single GPU? NVIDIA A10G with 24GB VRAM,"Hi, I'd like to try running this model on an AWS instance with an A10G GPU, 24GB VRAM.
Is this even possible? Would I have to run the convert.py script with  ?
Thanks",2025-01-24T10:08:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/332
331,2808938779,make convert.py use multiple processes, ,2025-01-24T09:16:48Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/331
330,2808865635,[BUG] convert.py cannot convert DeepSeek-R1-Distill-Qwen-1.5B,"**Describe the bug**
I tried to convert DeepSeek-R1-Distill-Qwen-1.5B.safetensor into other format, 

and then.    code stopped at here:   

file: convert.py.  line 63: assert key in mapping

Are there any other mappings that need to be replaced?",2025-01-24T08:38:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/330
329,2808402266,> 你好，介绍一下你自己 > 你好！我是一个由OpenAI开发的人工智能助手,,2025-01-24T03:18:26Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/329
328,2808380960,所以deepseek-v3就是套壳CHATGPT?,,2025-01-24T02:59:33Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/328
327,2807387792,API USAGE ISSUE : DeepSeek Api key is not completely free even though its an Open Source Model,"
DeepSeek is dependent on the OpenAI SDK , which won't allow us to use the service of the deepseek model version V-3 for our development purpose.

Is there any alternative we are going to use DeepSeek Api key for our developement project for completely free ?

 
",2025-01-23T16:36:51Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/327
326,2806673522,"Fix Critical Bug in Right-to-Left Language Support and Add Persian, Arabic, and Hebrew Languages","

Dear DeepSeek Team,  

I am writing to bring to your attention a critical bug affecting the handling of right-to-left (RTL) languages such as Persian, Arabic, and Hebrew in your application and website. Currently, when text is written or displayed in these languages, sentences are correctly aligned from right to left. However, any embedded English words or phrases within RTL text appear shuffled, out of order, or improperly formatted. This issue severely disrupts readability, creates confusion, and negatively impacts the user experience for RTL language speakers.  

To address this urgent issue, I kindly request the following actions:  
- 1. **Fix the RTL Text Rendering Bug**: Ensure that mixed-language content (e.g., Persian, Arabic, and Hebrew with embedded English) is displayed correctly, maintaining proper alignment and readability.  
- 2. **Add Support for Persian, Arabic, and Hebrew Languages**: Implement full localization for these languages, including interface translation and documentation, to make your platform more inclusive and accessible to RTL language users.  

This bug is a significant barrier for users who rely on RTL languages, and resolving it will greatly enhance the usability and global reach of your platform. I urge you to prioritize this fix in your upcoming updates.  

Thank you for your prompt attention to this matter. I look forward to seeing these improvements implemented soon.  

Best regards",2025-01-23T11:36:53Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/326
325,2806136004,[BUG]what is triton,"**Describe the bug**
When I follow the guide,run the command pip install -r requirements.txt,it reports the error:


Does this package exists?I use the command pip index versions triton and it says No matching distribution found for triton.
Thanks.
",2025-01-23T07:25:19Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/325
323,2805825813,train过程模型代码是没有上传吗？,"没有看到预训练阶段代码，想参考 Multi-Token  prediction （MTP）实现细节
",2025-01-23T03:39:18Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/323
322,2805775816,[BUG] No option to login in via google,"**Describe the bug**
There isn't an option to login via google, it's only offering chinese phone number

**To Reproduce**
Search deepseek login on google and press on the login link

**Expected behavior**
The login screen will pop up but only allow chinese phone number or wechat login methods.

**Screenshots**


**Additional context**
Add any other context about the problem here.
",2025-01-23T02:50:19Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/322
321,2804982650,Light Mode Support for Code Snippets,"**Description:**

Hello guys! Hope you are doing fine. I noticed that the code snippets only support dark mode. It would be great to have an option for light mode as well, especially for users who prefer lighter themes or are working in well-lit environments. A color scheme which is language specific ( Notion ) will be a cherry on top.
",2025-01-22T17:40:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/321
320,2803553902,deepseek的api怎么发送图文消息?,"
文档中说DeepSeek API 使用与 OpenAI 兼容的 API 格式, 但是无法发送图文信息, 是我的参数有问题吗?还是模型不支持?

",2025-01-22T06:55:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/320
319,2803376443,Multiple Features Request,"Hi DeepSeek Team,

As a dedicated user, I’d like to request the following features to enhance DeepSeek’s usability and user experience:

### **1. Memory System, similar to ChatGPT**

Allows users to save preferences and information about them.

“Remember I prefer lower caps in all messages, and a relevant emoji after every other sentence.”
AI: “k, got it 😌. all msgs will be in lowercase with emojis when needed 😉.”_

   _User:""My name is Matthew, I am 16 years old settled in California.""_

  _AI: ""got it! you are matthew, 16, living in california.""_

Memory Stored: User prefers lower caps in all messages with relevant emojis when needed. User's name is Matthew. User is 16 years old and lives in California.

### **2. Personas, as seen on character.ai**

Let me save multiple personas with unique  

Example:

    Persona 1: Academic Tutor

        Tone: Formal, detailed explanations.

        Example: “The Krebs cycle involves three key steps:...”

    Persona 2: Casual Friend

        Tone: Slang, emojis, abbreviations.

        Example: “Yo, the Krebs cycle’s like 🔄 energy stuff, ya know?”

**### 3. Easy Switching:**

    Add a UI toggle to switch between saved personas and models.

    Example: Dropdown menu with options like “Persona: Academic Tutor | Casual Friend | Code Mentor”. 
                      Dropdown menu with model changing like V3, R1...

嗨，DeepSeek团队，

作为用户，我想提出以下建议，以提升DeepSeek的可用性：

### **1. 记忆系统（类似于ChatGPT）**
   
_示例：用户说“记住我喜欢消息里带表情符号。”AI回复“了解😌。”然后保存记忆。_

### **2. 人设（类似于Character.ai）**
 

### **3. 快速切换**
添加下拉菜单，方便快速选择人设或模型（例如，V3、R1）。

谢谢！
Thank you.


",2025-01-22T04:53:59Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/319
318,2803278342,[BUG]触发无限回复的bug,"**Describe the bug**
A clear and concise description of what the bug is.

使用网址： 
当我提交的问题：

 
网页上会不断地输出回答，除非手动停止，否则会一直输出。

 
**To Reproduce**
Steps to reproduce the behavior.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here.
",2025-01-22T03:29:04Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/318
317,2803271222,[BUG] 账单统计错误！,"
昨天api出现了错误统计，deepseek reasoner模型的调用会和deepseek coder混用，如果在continue上使用了它默认的deepseek coder的模型选择，所有的请求全都会被计费成reasoner模型！

",2025-01-22T03:21:40Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/317
316,2802339241,chat.deepseek.com,"i am not able to access my previous chats on chat.deepseek.com its says ""Failed to load, you can retry loading.""",2025-01-21T16:46:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/316
315,2801993811,Alias o1 to deepseek-reasoner,"**Is your feature request related to a problem? Please describe.**
I user DeepSeek with Cursor but new r1 model not supported there.

**Describe the solution you'd like**
Make alias o1 to deepseek-reasoner
Since Cursor support OpenAI's o1 model it should work",2025-01-21T14:21:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/315
314,2801989228,网页版对话能否导出多轮对话的内容为json文件,"**Is your feature request related to a problem? Please describe.**
网页版对话能否导出对话的内容json文件
**Describe the solution you'd like**
我想解决在试用尝试调试时完成的任务demo，希望导出json文件，内容包含Q&A，并且包含对answers的点赞或点踩的tag；用以在cline以及其他工具的自动构建中以上传文件的形式作为参考，使用付费deepseek的api调用时，依据上传的json文件中的Q&A进行本地工作的展开。作为取样参照 对比本地工作的效果达到网页中的预期完成度。
**Describe alternatives you've considered**
或者导出txt的格式

**Additional context**
参考如下


",2025-01-21T14:19:13Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/314
313,2801723162,Ask for Cursor support,"Deepseek can be integrated into Cursor, (or hopefully Deepseek will develop a Cursor-like product on its own to bring the price down).",2025-01-21T12:33:31Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/313
312,2801162731,"Use with LangGraph, or other LLM dev framework.","I have been playing around LangGraph + DeepSeek V3 combo recently, but found out using ChatOpenAI to instantiate a DeepSeek V3 model leads to unexpected behaviour. For example, the framework cannot spot AIMessages that generated from DS3. The other painful use case is the tool calling, take a simple arithmetic calculation for example, most of the time it won't be able to finish in one single call(as it should, for multiplying 2 and 3), instead it falls into a long, sometimes infinite loop. My guess is still the AIMessage is not differed from tools message, so the calling loop does not end. 

It would be nice if things would align with other models, or perhaps have a ChatDeepSeek so that the output is more formatted and structured. I also notice some other developers issue the ""structured_output"" problem, so again it would be nice if LLM development become simpler or even more handy than other model or framework. ",2025-01-21T09:07:29Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/312
311,2800801128,[BUG] DeepSeek insists it's GPT-4 when calling the API,"**Describe the bug**

When calling the DeepSeek API targeting the   model using OpenAI SDK (I tried both in .NET and Python), it insists it's   in its response. I wonder what is going wrong? Please let me know if I miss something.

One interesting thing to be noted is **the part circled in red in the 2nd screenshot below**. It claims itself to be GPT-4 with knowledge cut-off date to be October 2023, but it knows about deepseek v3 ... However, it's not easy to reproduce in new chats, as you can see in the Python chat screenshot, it said it didn't know about deepseek.

_Try with Python_


_Try with .NET_


**To Reproduce**

It's basically the example code at  with a little enhancement.

Run the script, and ask questions like
-  
-   then  

 
It's also the same when using   directly:

 
**Expected behavior**

When asking about what type of LLM it is and what its model name, I expect it replies ""deepseek"" just as in what it does in https 


**Screenshots**

Screenshots shared above.

**Additional context**

 
",2025-01-21T06:15:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/311
310,2800595370,deepseek的词嵌入是哪个模型？,"**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
在使用phidata时要用到词嵌入eg:
db_url = ""postgresql+psycopg ai # Create a knowledge base of PDFs from URLs
knowledge_base = PDFUrlKnowledgeBase(
    urls=[""     # Use PgVector as the vector database and store embeddings in the   table
    vector_db=PgVector(
        table_name=""recipes"",
        db_url=db_url,
        search_type=SearchType.hybrid,
        embedder=OpenAIEmbedder(model=""text-embedding-3-small""),
    ),
)
请问题deepseek有没有embedder=OpenAIEmbedder(model=""text-embedding-3-small""),  这样的模型？
或者说这样可以直接使用model=""text-embedding-3-small""


**Describe the solution you'd like**
A clear and concise description of what you want to happen.
希望提供一些例子。

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
",2025-01-21T03:12:15Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/310
309,2800282335,Integrating Anthropic's MCP,"MCP will allow DSV3 to access local files.  Moreover, because MCP is open source, you don't need to reinvent the wheel. You just need to integrate it into your agent software.",2025-01-20T21:17:09Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/309
308,2798204043,Feature Request: Chat Share Option in DeepSeek,"Is your feature request related to a problem? Please describe.
Yes, the problem is that I am unable to share chat conversations or data from DeepSeek with my friends or colleagues. Currently, there is no built-in feature to export or share chat history, which makes collaboration and sharing information inconvenient.

Describe the solution you'd like
I would like DeepSeek to introduce a ""Chat Share"" feature that allows users to:

Export chat conversations as a text file, PDF, or shareable link.

Share specific messages or the entire chat history with others via email, messaging apps, or social media.

Optionally, include a feature to password-protect shared chats for added privacy.

This feature would make it easier to collaborate, share insights, or simply save important conversations for future reference.

Describe alternatives you've considered

Manual Copy-Paste: Currently, I manually copy and paste chat content into a document or messaging app, but this is time-consuming and loses formatting.

Screenshots: Taking screenshots of the chat is another workaround, but it is not ideal for long conversations and lacks searchability.

These alternatives are inefficient and do not provide a seamless user experience.",2025-01-20T05:11:24Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/308
307,2798120817,Not support langchain embeddings !  能不能支撑langchain 词潜入,"#### 需求说明
 langchain 使用Deepseek v3 做 Rag时，需要embeddings 数据  .

",2025-01-20T04:00:08Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/307
306,2798101907,could u provide some fp8 sft demo scripts?,"and What is the minimum GPU memory configuration required for training deepseek-v3 FP8? assuming seq-len=4096 

thanks",2025-01-20T03:40:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/306
305,2798044148,[Questions] Is that KV Cache in Multihead Latent Attention?,"I'm reading the technical reports of deepseekv2 and deepseekv3, and I see that people are using MLA in conjunction with RoPE. But I have a small question, which is whether KV Cache works here. I understand that we compress K and V into a latent matrix, so isn't the KV Cache used here because K and V are going to be recreated from scratch? ",2025-01-20T02:45:22Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/305
304,2797501349,请问，我想通过API 让deepseek-chat模型可以从互联网上获取信息，可以实现吗,"我是试了一些方法都没有成功。
 `c#
using System.Text.Json;
var client = new HttpClient();
var request = new HttpRequestMessage(HttpMethod.Post, "" request.Headers.Add(""Accept"",  
request.Headers.Add(""Authorization"", ""Bearer sk-d505bfb3fe33427595aefd613fc38957"");
var answer=""""""
{
  ""messages"": [
    {
      ""content"": ""You are a procurement inquiry specialist. Help me find the latest price for a suitable product from the internet. Only return the single price you consider most reasonable, and ensure it is in Chinese Yuan (RMB)."",
      ""role"": ""system""
    },
    {
      ""content"": ""水星5口全千兆交换机"",
      ""role"": ""user""
    }
  ],
  ""model"": ""deepseek-chat"",
  ""frequency_penalty"": 0.5,
  ""max_tokens"": 256,
  ""presence_penalty"": 0.5,
  ""response_format"": {
    ""type"": ""text""
  },
  ""stop"": null,
  ""stream"": false,
  ""stream_options"": null,
  ""temperature"": 0.7,
  ""top_p"": 1,
  ""tools"": null,
  ""tool_choice"": ""none"",
  ""logprobs"": false,
  ""top_logprobs"": null,
  ""currency"": ""CNY"",
  ""limit_results"": 1
}
"""""";
var content = new StringContent(answer, null,    
  content = new                      are a procurement inquiry specialist. Help me find the latest price for a suitable product from the internet. Only return the single price you consider most reasonable, and ensure it is in Chinese Yuan                                                                                                                                  null,  
request.Content = content;
var response = await client.SendAsync(request);
response.EnsureSuccessStatusCode();
var jsonResponse = await response.Content.ReadAsStringAsync();
Console.WriteLine(jsonResponse);
using var doc = JsonDocument.Parse(jsonResponse);
var result = doc.RootElement
                 .GetProperty(""choices"")[0]
                 .GetProperty(""message"")
                 .GetProperty(""content"")
                 .GetString();

Console.WriteLine(result);
 `",2025-01-19T08:33:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/304
303,2796790371,[BUG] DeepSeek V3 Does Not Respect `max_tokens` Parameter in LangChain with `ChatOpenAI()`,"**Description:**  

When using   with DeepSeek V3 in LangChain, the   parameter does not effectively limit the length of the model's output. Despite setting  , the model generates a response significantly longer than the specified limit. This behavior prevents users from controlling the response length, which is critical for applications requiring concise outputs.

**Steps to Reproduce:**

1. Install the required libraries:
    
The theory of relativity is a fundamental concept in physics, primarily developed by Albert Einstein. It consists of two parts: the Special Theory of Relativity and the General Theory of Relativity. Here's a simplified explanation of both:

### Special Theory of Relativity (1905)
1. **Speed of Light**: The speed of light in a vacuum is always the same, no matter how fast an observer is moving. This speed is approximately 299,792 kilometers per second (about 186,282 miles per second).
2. **Relativity of Simultaneity**: Events that appear simultaneous to one observer may not be simultaneous to another observer moving at a different speed.
3. **Time Dilation**: Time passes more slowly for an object in motion compared to one at rest. This effect becomes noticeable at speeds close to the speed of light.
4. **Length Contraction**: Objects in motion contract in the direction of motion as their speed approaches the speed of light.
5. **Mass-Energy Equivalence**: Energy (E) and mass (m) are interchangeable, as described by the famous equation   E = mc^2   where   c   is the speed of light.

### General Theory of Relativity (1915)
1. **Gravity as Curvature**: Gravity is not a force in the traditional sense but rather the result of the curvature of spacetime caused by mass and energy. Massive objects like planets and stars warp the fabric of spacetime, and this curvature affects the motion of objects.
2. **Equivalence Principle**: The effects of gravity are locally indistinguishable from acceleration. For example, if you were in a closed elevator, you wouldn't be able to tell if you were being pulled by gravity or if the elevator were accelerating upward.
3. **Gravitational Time Dilation**: Time runs slower in stronger gravitational fields. For instance, a clock on the surface of the Earth ticks more slowly than one in space.
4. **Light Bending**: Light bends when it passes near a massive object, a phenomenon known as gravitational lensing.

### Practical Implications
- **GPS Systems**: The precise timing required for GPS satellites must account for both special and general relativistic effects to provide accurate location data.
- **Black Holes**: General relativity predicts the existence of black holes, regions of spacetime where gravity is so strong that nothing, not even light, can escape.
- **Cosmology**: The theory underpins our understanding of the universe's expansion and the behavior of galaxies and cosmic structures.

In essence, the theory of relativity revolutionized our understanding of space, time, and gravity, showing that they are interwoven in a four-dimensional fabric called spacetime.
 `

**Environment:**

- Python 3.x (Kaggle Notebook)
- Libraries:  ,  ,  ,  
- Model:   (via DeepSeek API)
",2025-01-18T06:20:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/303
302,2796784643,[BUG] DeepSeek V3 Does Not Support Structured Output in LangChain with `ChatOpenAI()`,"**Description:**  

When using   with DeepSeek V3 in LangChain, the   method fails to enforce structured output formats (e.g., Pydantic models). The model returns an error indicating that the   type   is unavailable. This prevents the use of structured output functionality, which is critical for applications requiring consistent and predictable data formats.

**Steps to Reproduce:**

1. Install the required libraries:
    python
Person(name=""John Doe"", age=30, email=""john.doe  
content='Here is the extracted   **Name**: John Doe    **Age**: 30    **Email**: john.doe additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 30, 'prompt_tokens': 33, 'total_tokens': 63, 'completion_tokens_details': None, 'prompt_tokens_details': None, 'prompt_cache_hit_tokens': 0, 'prompt_cache_miss_tokens': 33}, 'model_name': 'deepseek-chat', 'system_fingerprint': 'fp_3a5770e1b4', 'finish_reason': 'stop', 'logprobs': None} id='run-d078cad9-42a0-4be0-9e92-a593002a8606-0' usage_metadata={'input_tokens': 33, 'output_tokens': 30, 'total_tokens': 63, 'input_token_details': {}, 'output_token_details': {}}

---------------------------------------------------------------------------
UnprocessableEntityError                  Traceback (most recent call last)
<ipython-input-14-83b3c0097ccc> in <cell line: 18>()
     16 
     17 # Query the model with structured output
---> 18 response = structured_model.invoke(""Extract the name, age, and email of John Doe, who is 30 years old and has the email john.doe      19 print(response)

  in invoke(self, input, config, **kwargs)
   3018                 context.run(_set_config_context, config)
   3019                 if i == 0:
-> 3020                     input = context.run(step.invoke, input, config, **kwargs)
   3021                 else:
   3022                     input = context.run(step.invoke, input, config)

  in invoke(self, input, config, **kwargs)
   5350         **kwargs: Optional[Any],
   5351     ) -> Output:
-> 5352         return self.bound.invoke(
   5353             input,
   5354             self._merge_configs(config),

  in invoke(self, input, config, stop, **kwargs)
    284         return cast(
    285             ChatGeneration,
--> 286             self.generate_prompt(
    287                 [self._convert_input(input)],
    288                 stop=stop,

  in generate_prompt(self, prompts, stop, callbacks, **kwargs)
    784     ) -> LLMResult:
    785         prompt_messages = [p.to_messages() for p in prompts]
--> 786         return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
    787 
    788     async def agenerate_prompt(

  in generate(self, messages, stop, callbacks, tags, metadata, run_name, run_id, **kwargs)
    641                 if run_managers:
    642                     run_managers[i].on_llm_error(e, response=LLMResult(generations=[]))
--> 643                 raise e
    644         flattened_outputs = [
    645             LLMResult(generations=[res.generations], llm_output=res.llm_output)  # type: ignore[list-item]

  in generate(self, messages, stop, callbacks, tags, metadata, run_name, run_id, **kwargs)
    631             try:
    632                 results.append(
--> 633                     self._generate_with_cache(
    634                         m,
    635                         stop=stop,

  in _generate_with_cache(self, messages, stop, run_manager, **kwargs)
    849         else:
    850             if inspect.signature(self._generate).parameters.get(""run_manager""):
--> 851                 result = self._generate(
    852                     messages, stop=stop, run_manager=run_manager, **kwargs
    853                 )

  in _generate(self, messages, stop, run_manager, **kwargs)
    771             payload.pop(""stream"")
    772             try:
--> 773                 response = self.root_client.beta.chat.completions.parse(**payload)
    774             except openai.BadRequestError as e:
    775                 _handle_openai_bad_request(e)

  in parse(self, messages, model, audio, response_format, frequency_penalty, function_call, functions, logit_bias, logprobs, max_completion_tokens, max_tokens, metadata, modalities, n, parallel_tool_calls, prediction, presence_penalty, reasoning_effort, seed, service_tier, stop, store, stream_options, temperature, tool_choice, tools, top_logprobs, top_p, user, extra_headers, extra_query, extra_body, timeout)
    158             )
    159 
--> 160         return self._post(
    161              
    162             body=maybe_transform(

  in post(self, path, cast_to, body, options, files, stream, stream_cls)
   1281             method=""post"", url=path, json_data=body, files=to_httpx_files(files), **options
   1282         )
-> 1283         return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
   1284 
   1285     def patch(

  in request(self, cast_to, options, remaining_retries, stream, stream_cls)
    958             retries_taken = 0
    959 
--> 960         return self._request(
    961             cast_to=cast_to,
    962             options=options,

  in _request(self, cast_to, options, retries_taken, stream, stream_cls)
   1062 
   1063             log.debug(""Re-raising status error"")
-> 1064             raise self._make_status_error_from_response(err.response) from None
   1065 
   1066         return self._process_response(

UnprocessableEntityError: Failed to deserialize the JSON body into the target type: response_format: response_format.type   is unavailable now at line 1 column 626
 `

**Root Cause:**  

The DeepSeek V3 API does not currently support the   parameter required by LangChain's   method. This limitation prevents the model from returning structured outputs in the requested format.

**Environment:**

- Python 3.x (Kaggle Notebook)
- Libraries:  ,  ,  ,  
- Model:   (via DeepSeek API)",2025-01-18T06:05:19Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/302
301,2796780609,[BUG] DeepSeek V3 API Call Does Not Stop Automatically in LangChain with `ChatOpenAI()`,"**Describe the bug**

When using   in LangChain with a custom tool (e.g.,  ), the API call does not terminate automatically after completing the task. Instead, it continues to make repeated tool calls indefinitely, leading to an infinite loop of search queries and responses.

**To Reproduce**
1. Install the required libraries:
    
================================ Human Message =================================

Search for the latest news on AI advancements
================================== Ai Message ==================================
Tool Calls:
  search_web (call_0_bbac5567-4a8c-4e13-8752-ad37a91f6f27)
 Call ID: call_0_bbac5567-4a8c-4e13-8752-ad37a91f6f27
  Args:
    query: latest news on AI advancements 2023
================================= Tool Message =================================
Name: search_web

With broad AI comes broad risks in everything from misinformation to AI-related privacy risks, failures, and mistakes. 2023 saw an increasing flow of regulation, from the United States AI Bill of ... 2022 was the year that generative artificial intelligence (AI) exploded into the public consciousness, and 2023 was the year it began to take root in the business world. 2024 thus stands to be a pivotal year for the future of AI, as researchers and enterprises seek to establish how this evolutionary leap in technology can be most practically integrated into our everyday lives. While overall AI private investment decreased in 2023, funding for generative AI sharply increased. The sector attracted $25.2 billion last year, nearly nine times the investment of 2022 and about 30 times the amount in 2019. Generative AI accounted for over a quarter of all AI-related private investment in 2023. Artificial intelligence. Download RSS feed: News Articles   In the Media   Audio. Displaying 1 - 15 of 1293 news articles related to this topic. Show: News Articles. In the Media. Audio. Explained: Generative AI's environmental impact. Rapid development and deployment of powerful generative AI models comes with environmental consequences ... GPU chips play a pivotal role in enhancing overall performance for genAI applications, underscoring their significance for future advancements. Defining AI trends for 2023. Amid the myriad of AI trends and themes that characterized 2023, four major themes emerged, shaping the narrative and capturing widespread attention:
================================== Ai Message ==================================
Tool Calls:
  search_web (call_0_744fb5ec-24f1-4e8d-b3d0-5704b72b60a7)
 Call ID: call_0_744fb5ec-24f1-4e8d-b3d0-5704b72b60a7
  Args:
    query: latest AI advancements 2023
================================= Tool Message =================================
Name: search_web

Uncover the transformative AI achievements of 2023, with a focus on NLP, computer vision, robotics, voice recognition, and HCI. This comprehensive overview highlights the latest advancements and their far-reaching impacts on technology, industry, and society. Top AI achievements of 2023. AI achievements in 2023 were quite phenomenal. March 30, 2023: GitHub presents AutoGTP based on ChatGPT-4.; May 12, 2023: OpenAI releases ChatGPT Plugins; May 25, 2023: Nvidia's stock price tripled in less than eight months, surging by 27% overnight and approaching a market cap of $1 trillion, driven by the skyrocketing demand for GPUs due to the growing interest in AI.; May 25, 2023: Google launches Google Search Generative Experience. The year 2023 marked a significant period in the evolution of artificial intelligence (AI), characterized by groundbreaking advancements across various domains. From the development of ... 2022 was the year that generative artificial intelligence (AI) exploded into the public consciousness, and 2023 was the year it began to take root in the business world. 2024 thus stands to be a pivotal year for the future of AI, as researchers and enterprises seek to establish how this evolutionary leap in technology can be most practically integrated into our everyday lives. AI Highlights from May 2023. Advancements in robotics took center stage in May 2023 as an AI system enabled robots to autonomously conduct scientific experiments. While Sanctuary AI's Phoenix Robot became the world's first commercially available general-purpose humanoid robot, Tesla's Optimus Bot also showed significant progress. OpenAI ...
================================== Ai Message ==================================
Tool Calls:
  search_web (call_0_ba4b87e9-2096-49f2-840c-f1790cacc0f4)
 Call ID: call_0_ba4b87e9-2096-49f2-840c-f1790cacc0f4
  Args:
    query: AI advancements 2023
================================= Tool Message =================================
Name: search_web

The report, which is in its seventh edition, covers trends such as technical advancements in AI, public perceptions of the technology and the geopolitical dynamics surrounding its development. Emerging Technologies ... In 2023, 25 AI-related regulations were enacted in the US, growing the total number by 56.3%. Compare that to 2016, when just ... 2023 AI Recap: Innovations in AI Technology. OpenAI's GPT-4: Revolutionized user interactions with AI through advanced natural language understanding, setting a new standard in generative models.; Multimodal Capabilities: GPT-4's ability to process various types of data, including text and images, marked a significant step towards more versatile AI systems. This year's AI Index — a 500-page report tracking 2023's worldwide trends in AI — is out.. The index is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), led by the AI Index Steering Committee, an interdisciplinary group of experts from across academia and industry. This year's report covers the rise of multimodal foundation models ... 2022 was the year that generative artificial intelligence (AI) exploded into the public consciousness, and 2023 was the year it began to take root in the business world. 2024 thus stands to be a pivotal year for the future of AI, as researchers and enterprises seek to establish how this evolutionary leap in technology can be most practically integrated into our everyday lives. GPU chips play a pivotal role in enhancing overall performance for genAI applications, underscoring their significance for future advancements. Defining AI trends for 2023. Amid the myriad of AI trends and themes that characterized 2023, four major themes emerged, shaping the narrative and capturing widespread attention:
================================== Ai Message ==================================
Tool Calls:
  search_web (call_0_9d944700-80fc-4092-a60c-d75e36d351d1)
 Call ID: call_0_9d944700-80fc-4092-a60c-d75e36d351d1
  Args:
    query: AI trends 2023
================================= Tool Message =================================
Name: search_web

This year's AI Index — a 500-page report tracking 2023's worldwide trends in AI — is out.. The index is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), led by the AI Index Steering Committee, an interdisciplinary group of experts from across academia and industry. This year's report covers the rise of multimodal foundation models ... Defining AI trends for 2023. Amid the myriad of AI trends and themes that characterized 2023, four major themes emerged, shaping the narrative and capturing widespread attention: Business adoption: Although consumer genAI applications grabbed headlines, 2023 also saw increased business experimentation. Enterprise adoption, startup activity and ... Get 90+ pages of charts and data detailing the latest venture trends in AI. First name. Last name. Email. ... In 2023, AI startups raised $42.5B across 2,500 equity rounds. Although down 10% year-over-year (YoY), AI funding fell far less than broader venture funding (-42% in 2023). AI deal volume decreased by 24% YoY — also less than the ... 2022 was the year that generative artificial intelligence (AI) exploded into the public consciousness, and 2023 was the year it began to take root in the business world. 2024 thus stands to be a pivotal year for the future of AI, as researchers and enterprises seek to establish how this evolutionary leap in technology can be most practically integrated into our everyday lives. March 30, 2023: GitHub presents AutoGTP based on ChatGPT-4.; May 12, 2023: OpenAI releases ChatGPT Plugins; May 25, 2023: Nvidia's stock price tripled in less than eight months, surging by 27% overnight and approaching a market cap of $1 trillion, driven by the skyrocketing demand for GPUs due to the growing interest in AI.; May 25, 2023: Google launches Google Search Generative Experience.
================================== Ai Message ==================================
Tool Calls:
  search_web (call_0_3a3d86af-8b05-482a-ac02-4fc1c4437cfc)
 Call ID: call_0_3a3d86af-8b05-482a-ac02-4fc1c4437cfc
  Args:
    query: AI regulations 2023
================================= Tool Message =================================
Name: search_web

New Framework Advances AI Innovation While Protecting U.S. National Security. Washington, D.C. — Today, the Department of Commerce's Bureau of Industry and Security (BIS) announced controls on advanced computing chips and certain closed artificial intelligence (AI) model weights, alongside new license exceptions and updates to the Data Center Validated End User (VEU) authorization. Section 1. Purpose. Artificial intelligence (AI) is a defining technology of our era. ... and Executive Order 14110 of November 1, 2023 (Safe, Secure, and Trustworthy Development and Use of ... In October 2023, President Joe Biden issued a broad executive order on AI (AI Order), which the administration touted as a vehicle to establish AI safety and security standards while protecting privacy, advancing civil rights and promoting innovation. However, most of the AI Order was a series of directives to various federal agencies to study ... In October 2023, President Joe Biden issued a broad executive order on AI ... we expect to see states take an even more active role in enacting state-specific AI regulations. Monitor Legal and Regulatory Developments: Stay informed about new state and federal AI regulations, along with the developing case law related to AI and ERISA litigation. Establish a process for ...
================================== Ai Message ==================================
Tool Calls:
  search_web (call_0_10269960-464f-4abd-a458-b863da1710b4)
 Call ID: call_0_10269960-464f-4abd-a458-b863da1710b4
  Args:
    query: AI investment trends 2023
  search_web (call_1_97eabda5-9144-4a28-93f9-e21ec851b5ba)
 Call ID: call_1_97eabda5-9144-4a28-93f9-e21ec851b5ba
  Args:
    query: AI private investment 2023
================================= Tool Message =================================
Name: search_web

Private equity and venture capital firms poured more than twice as much money into generative AI companies in 2023 compared with the prior year, and dealmaking got off to a fast start in 2024. ... Incera said. Private equity-backed investment peaked at $927.7 million in the third quarter of 2023, up from $121.5 million recorded in the prior ... The volume of private investment in artificial intelligence (AI) worldwide in 2023 had its largest share concentrated in the United States of America, with approximately 67.2 billion U.S. Generative AI accounted for over a quarter of all AI-related private investments in 2023. U.S. Wins $$ Race. And again, in 2023 the United States dominates in AI private investment. In 2023, the $67.2 billion invested in the U.S. was roughly 8.7 times greater than the amount invested in the next highest country, China, and 17.8 times the amount ... The graph below shows the yearly number of private equity platform and add-on investments in AI. Quick stats. 315 Private equity firms with current investments in the Artificial Intelligence and Machine Learning industry. 573 current private equity investments in   companies (2013 - 2023, 52 exits) Download our free list As from 2013 to 2023, the sum of private investments in artificial intelligence (AI) worldwide has had its largest share coming from the United States of America, accounting for over 336 billion U.S.
================================== Ai Message ==================================
Tool Calls:
  search_web (call_0_1667e946-5452-48ab-b5e2-d6e3fb9a526b)
 Call ID: call_0_1667e946-5452-48ab-b5e2-d6e3fb9a526b
  Args:
    query: AI investment trends 2023
  search_web (call_1_d7bdf09b-5892-4e60-91a8-63a4b898a209)
 Call ID: call_1_d7bdf09b-5892-4e60-91a8-63a4b898a209
  Args:
    query: AI private investment 2023
================================= Tool Message =================================
Name: search_web

Private equity and venture capital firms poured more than twice as much money into generative AI companies in 2023 compared with the prior year, and dealmaking got off to a fast start in 2024. ... Incera said. Private equity-backed investment peaked at $927.7 million in the third quarter of 2023, up from $121.5 million recorded in the prior ... The volume of private investment in artificial intelligence (AI) worldwide in 2023 had its largest share concentrated in the United States of America, with approximately 67.2 billion U.S. Generative AI accounted for over a quarter of all AI-related private investments in 2023. U.S. Wins $$ Race. And again, in 2023 the United States dominates in AI private investment. In 2023, the $67.2 billion invested in the U.S. was roughly 8.7 times greater than the amount invested in the next highest country, China, and 17.8 times the amount ... Figure 3 shows that no country, at least on a private investment basis, is spending anything close to what the U.S. is spending, which likely means it will be difficult to challenge U.S. leadership in this area. Figure 3: Private Investment in AI by Geographic Area, 2023. 2023 Was a Year of Building AI Applications As from 2013 to 2023, the sum of private investments in artificial intelligence (AI) worldwide has had its largest share coming from the United States of America, accounting for over 336 billion U.S.
================================== Ai Message ==================================
Tool Calls:
  search_web (call_0_cb9e01b0-7850-4038-a727-aa7086c97191)
 Call ID: call_0_cb9e01b0-7850-4038-a727-aa7086c97191
  Args:
    query: AI investment trends 2023
  search_web (call_1_d658f5fe-7db0-42e1-81a6-d003f231efe6)
 Call ID: call_1_d658f5fe-7db0-42e1-81a6-d003f231efe6
  Args:
    query: AI private investment 2023
  search_web (call_2_7ccf4fdf-3921-41b9-b805-9157d25998be)
 Call ID: call_2_7ccf4fdf-3921-41b9-b805-9157d25998be
  Args:
    query: AI regulations 2023
  search_web (call_3_df9e132a-1328-4db8-8984-3051c59dcee2)
 Call ID: call_3_df9e132a-1328-4db8-8984-3051c59dcee2
  Args:
    query: AI advancements 2023
  search_web (call_4_46a83a96-205c-47c0-8c67-94d07ac91894)
 Call ID: call_4_46a83a96-205c-47c0-8c67-94d07ac91894
  Args:
    query: latest AI advancements 2023
  search_web (call_5_d506554b-9025-407b-8cdc-9154d547ab88)
 Call ID: call_5_d506554b-9025-407b-8cdc-9154d547ab88
  Args:
    query: latest news on AI advancements 2023
================================= Tool Message =================================
Name: search_web

With broad AI comes broad risks in everything from misinformation to AI-related privacy risks, failures, and mistakes. 2023 saw an increasing flow of regulation, from the United States AI Bill of ... Artificial intelligence. Download RSS feed: News Articles   In the Media   Audio. Displaying 1 - 15 of 1293 news articles related to this topic. Show: News Articles. In the Media. Audio. Explained: Generative AI's environmental impact. Rapid development and deployment of powerful generative AI models comes with environmental consequences ... March 30, 2023: GitHub presents AutoGTP based on ChatGPT-4.; May 12, 2023: OpenAI releases ChatGPT Plugins; May 25, 2023: Nvidia's stock price tripled in less than eight months, surging by 27% overnight and approaching a market cap of $1 trillion, driven by the skyrocketing demand for GPUs due to the growing interest in AI.; May 25, 2023: Google launches Google Search Generative Experience. GPU chips play a pivotal role in enhancing overall performance for genAI applications, underscoring their significance for future advancements. Defining AI trends for 2023. Amid the myriad of AI trends and themes that characterized 2023, four major themes emerged, shaping the narrative and capturing widespread attention: AI Highlights from May 2023. Advancements in robotics took center stage in May 2023 as an AI system enabled robots to autonomously conduct scientific experiments. While Sanctuary AI's Phoenix Robot became the world's first commercially available general-purpose humanoid robot, Tesla's Optimus Bot also showed significant progress. OpenAI ...
 `


",2025-01-18T05:54:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/301
300,2796606266,[BUG] Missing v3 tokenizer, links the v2 tokenizer still,2025-01-18T03:07:44Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/300
299,2795751226,[BUG] Deep Think feature Korean generation error,"I turned on the Deep Think feature and asked questions in Korean, but the AI's answers were a mixture of Korean, Chinese, and Japanese.


",2025-01-17T15:33:50Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/299
298,2795294313,confusion in ParallelEmebdding layer in model.py,"hi there,

in the ParallelEmebdding layer as defined in   in the following code:

 
if I understand it correctly, in each rank the local emebdding matrix handles a portion of the token ids specified by the range  , excluding rhs.

the line   shifts the token ids to be aligned with the starting idx assigned to the current rank, so that it can be processed by the local embedding matrix (which starts from token id = 0). so for instance if the local rank handles token ids [100, 199], the token id 100 in the original input tensor x is now shifted to become 0.

the next line masks out the out-of-range token ids to also be 0.

what's confusing to me is that now how would I distinguish a token id = 0 in the input tensor x to be from 1) an originall out-of-range token id (for the current rank) or 2) an original valid token id which equals self.vocab_start_idx? it seems to me that these two lines of code have now shrinked the valid range of token ids handled by the current rank to become  , both excluding.

I haven't run the code yet so this confusion might be due to some oversight on my part. in the given code world_size=1 is fixed so this code snippet never gets executed anyways. I'd appreciate if there could be some clarification or guidance on tests.",2025-01-17T11:48:43Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/298
297,2794898677,请问deepseek要怎么部署才可以支持function call?,"目前使用了sglang、vllm部署都不支持function call，然而我调用官方api的时候时可以的，想问问怎么部署才可以支持呢？
期待回复~",2025-01-17T08:46:58Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/297
296,2794885621,ollama部署deepseek-V3需要多少显存呢,ollama部署deepseek-V3需要多少显存呢,2025-01-17T08:39:33Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/296
295,2794277923,运行模型最佳配置是多少A100 GPU？,想咨询一下运行DeepSeek-V3的最佳配置推荐？需要多少个A100的GPU，是否支持多节点部署？,2025-01-17T02:06:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/295
294,2793629031,[BUG] RTL Arabic Direction,"**Describe the bug**
we need deepseek be compatible with The direction of text generated in the Arabic language
as the default direction is ltr but in arabic is rtl, this makes the text difficult to read at some time",2025-01-16T19:41:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/294
293,2792230871,An issue about pretraining deepseek v3,在报告里的预训练阶段，我看到说是将模型的每一层放到不同的GPU中训练，想请问在分层后每层的梯度值是如何计算的呢？譬如说第一层传到第二层的输入是h1，然后反向传播得到了第二层参数对h1的导数（假设为g1），那在更新第一层参数时是仍然使用链式法则（g1乘上h1关于参数的导数）吗？还是逐层训练（假设每一层都是独立的模型，直接用loss对h1求导再链式法则更新其他梯度）？如果是前者的话怎么解决梯度消失的问题呢？（梯度爆炸我看到好像使用了gradient clipping norm且值设为1.0），感谢回答！,2025-01-16T09:54:26Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/293
292,2792083477,用多台H100机器推理Deepseek V3时，如何启用RDMA网络进行多台机器进行通信？,用多台H100机器推理Deepseek V3时，如何启用RDMA网络进行多台机器进行通信？,2025-01-16T08:52:22Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/292
291,2791680959,调用api测试模型时，参数n只能设置为1,"我需要测试代码生成任务，指标时pass 
",2025-01-16T04:41:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/291
290,2791658492,[BUG] Fail to import is_torch_greater_or_equal_than_1_13 since transformers v4.48.0 for all deepseek models,"**Describe the bug**
Fail to import   since transformers v4.48.0 for all deepseek models

**To Reproduce**
Install transformers v4.48.0 and run any deepseek model.

**Expected behavior**
Can run deepseek models with transformers v4.48.0.

**Additional context**
  has been removed since transformers v4.48.0, it is necessary to remove all usage of   from all deepseek models, not only this one. Can the maintainers help to fix all the models? It is important for us to provide deepseek model support in TensorRT-LLM. Thanks~
",2025-01-16T04:23:24Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/290
288,2790716638,[BUG]no code on mail at signup,"**Describe the bug**
no code on mail at signup. european union country

**To Reproduce**
start a new account, complete mail and password, click sent code

**Expected behavior**
receive verification code on mail.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
protonmail.",2025-01-15T19:54:38Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/288
287,2790249571,MoE infra question/suggestion,"


Idea is to have multiple instances of each expert, so a pool of expert number M that router request to. So that router instance number scales with user requests number and expert instances number scales at a slower pace (i.e. more optimized expert use). (But maybe you already implement something similar?)

Then you can have stats of which expert is used the most and train new models accordingly.

I have a question about why token-level routing is used and not full-user-request routing? Is latency an issue then?
",2025-01-15T16:07:08Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/287
286,2789083461,PDF Upload functionality in API,The deepseek GUI allows to upload pdf and ask questions from it. Is there a similar functionality available in API?,2025-01-15T07:56:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/286
285,2788109833,Max input size. Feature request.,"Hi!

For the next release, please consider increasing the API allowed 'max input size' limit. I think it's different from a context window, just max size of the input text. It is very limiting factor. Increasing this limit to at least 1M input tokens would be a significant improvement and help alleviate a major limiting factor.

Additionally, it would be beneficial if we had an option to decrease the context window to 0 and increase the max input size to its maximum possible value. This would provide more flexibility for users who require it. 

Thank you! ",2025-01-14T19:22:19Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/285
283,2788000465,Question about MLA with TP,"The DeepSeek paper mentions 4-way TP for the MLA attention layer at inference time. However, from code, it seems that different card has its own Linear module (e.g.  to project KV latents to num_heads, and either KV or the linear module are sharded across TP. 

Is KV duplicated across all TP ranks in this case? Or, does different card have different KVs?",2025-01-14T18:26:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/283
275,2787363918,[BUG] RTL problem in website, ,2025-01-14T14:45:53Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/275
274,2787098739,compare base and original version,compare DeepSeek-V3-Base DeepSeek-V3,2025-01-14T13:13:22Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/274
273,2786564167,[BUG] 英文分词问题以及文本来源发现。,"接着 发现，我将词表中英文词汇按照长度排序输出如下：
 
发现都是一些专业词汇，我将这些词扔给claude问领域和来源。

 
Claude给出如下回答：


根据这些长词汇的特点，我来分析可能的专业领域和文本来源：

# 专业领域分析

## 医学和生物医药领域
* pharmacological(药理学的)、epidemiological(流行病学的)、atherosclerosis(动脉粥样硬化)等词汇明显来自医学领域，特别是涉及药物研究、疾病研究和临床医学
* pathophysiology(病理生理学)和musculoskeletal(肌肉骨骼的)表明文本包含大量人体医学相关内容
* corticosteroids(皮质类固醇)和bioavailability(生物利用度)反映了药物研究和制药学科的专业内容

## 学术研究领域
* bibliographical(书目的)和reproducibility(可重复性)暗示这些文本包含大量学术研究论文
* epistemological(认识论的)和physicochemical(物理化学的)表明涉及基础科学研究
* collaboratively(协作地)显示有关于科研合作的内容

## 教育领域
* extracurricular(课外的)明显与教育领域相关
* retrospectively(回顾性地)和autobiographical(自传的)表明可能包含教育经历和个人发展相关内容

# 可能的文本来源

## 学术文献
* 医学期刊论文
* 生物医药研究报告
* 临床试验报告
* 科研项目申请书和结题报告

## 专业出版物
* 医学教科书
* 药物研究手册
* 临床指南
* 专业综述文章

## 教育资料
* 大学课程材料
* 研究生培养方案
* 继续教育材料
* 专业培训文档

## 行业文献
* 制药公司的研究报告
* 医疗机构的技术文档
* 医药行业标准文件
* 专业协会发布的指导文件

这些词汇的专业性和学术性非常强，说明这个语料库很可能是从专业学术数据库（如PubMed、ScienceDirect等）中收集的文献资料，以及各大医学院校、研究机构的教育教学资料。文本的主要受众应该是医学研究人员、医疗从业者、生物医药领域的学者以及相关专业的学生。


其实我很好奇，deepseek中的代码是怎么做BPE分词的？长词中没发现代码的词汇。 作为对比，我统计qwen2.5的词表，发现长的词汇都是代码，没有你们家这样的专业论文。 Qwen2.5我也提交了一个issue  。 当然也可以构造prompt问分词或者数字母，这里我就不构造了。",2025-01-14T08:41:50Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/273
272,2786461951,[BUG]使用TensorRT-llm 的Deepseek分支 部署4bit weight only的deepseekV3回答乱码,"**Describe the bug**
使用TensorRT-llm 的Deepseek分支 部署4bit weight only的deepseekV3回答乱码

**To Reproduce**
我参考DeepseekV3readme文件的描述使用了如下的引导获得了4bit weight only版本的引擎文件(先转bf16再量化):
 
但是转换出的模型是乱码的, 在TensorRT llm的issue中也看到了类似的问题, 请问大家有尝试过这个路线来部署么?
 

**Expected behavior**
正常输出结果, 小幅降低精度.

**Screenshots**
 
 
",2025-01-14T07:39:13Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/272
270,2786204842,"Questions on the workflow of all-to-all combine, and MoE Experts placement on 320 GPUs","Thank you for the great work!

First of all, I was wondering how do you place the 256 MoE Experts on 320 GPUs in order to achieve the best performance? 

Also, could you explain a bit more on the process of the MoE all-to-all combine (such as the overall gate, routed&shared experts workflow)?

Kind regards",2025-01-14T04:21:06Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/270
269,2784253612,api调用的速度越来越慢，10多分钟才有响应,在cline中，api调用的速度异常的缓慢，任务刚开始还好，调用了几次之后，开始就变得异常的缓慢，搞得我不得不切换其他的api，花了20多块，我宁愿把这20多块充值到你的账户里面。是不是我没有充钱的原因，但是你们送的这500w的token，这种调用速度，估计到期都用不完。,2025-01-13T15:04:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/269
266,2783569382,Can't signup,"**Describe the bug**
I can't signup. The verification code process hangs, see screenshot, and then eventually fails with a network error:


I'm in the EU. 

",2025-01-13T10:26:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/266
265,2783410109,Cline插件调用卡在Api request,经过排查，发现当大量输入token时，回复等待时间超过10分钟！,2025-01-13T09:15:57Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/265
264,2783150596,页面字体行间距太大，生成的代码显示不好看，可以参考poe.com的排版,"**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
",2025-01-13T06:42:37Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/264
263,2783103237,[BUG]分词错误对大模型结果影响的发现,"**Describe the bug**
我通过对deepseek-v3模型的词表以及分词结果分析，构造出一些可能由于分词导致大模型给出错误结果的提示词。

**To Reproduce**
直接通过 或者api调用测试如下提示词：

提示词：李鹏飞和李鹏飞到南京了。请严格根据上文回答：李鹏在哪里？怎么到的？

提示词：最高人民法院党史学习教育需要注意的是马克思恩格斯习近平新时代中国特色社会主义思想。请严格根据上文回答下面问题（不要使用任何模型自身知识，如果无答案请回答不知道）：中级人民法院学习注意什么？

**Expected behavior**

第一个提示词期望回复是： 李鹏在南京，坐飞机到的。

第二个提示词是：不知道

**Screenshots**


**Additional context**

更多分析参考  

",2025-01-13T06:16:39Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/263
262,2782811090,请问：在MindIE平台上运行DeepSeek V3模型的性能数据和硬件要求,在Readme中提到DeepSeek V3模型可以在华为MindIE平台上运行，请问性能数据怎样？只跑推理的需要多少NPU资源？需要什么规格的NPU资源？,2025-01-13T01:08:22Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/262
261,2782336845,[BUG]: 关于Langchain的Agent Tools调用的问题,"**可能是bug描述**
目前的情况是这样的,我基于Nodejs V20-lts版本,实现一个langchain的工具函数的调用,但是在测试的途中,发现了循环调用的问题,我在官方文档上看到这个描述

希望官方可以回答我这个问题是否得到了修复,有大佬可以就是解答我的疑问不,十分感谢
",2025-01-12T07:51:22Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/261
257,2781486570,实用集成里增加一个安卓开源客户端 chatAir,开源 android (安卓) 客户端了解下，比现在官方推荐的 ChatBox 好用得多。 ,2025-01-11T01:12:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/257
256,2780084087,[BUG] Network failure on captcha,"**Describe the bug**
I always have ""Network failure, Click to retry"" when trying to log in using google on   No VPN, no proxy, located in France.

**To Reproduce**
Try to login on a mobile device.

**Expected behavior**
Login works with Google.

**Screenshots**
This for a few seconds:

Then this:


",2025-01-10T12:32:37Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/256
255,2779633702,Tool calling response: Agent stopped due to max iterations.,"总是返回 
 `shell 
Tool calling response:
Agent stopped due to max iterations.
 python
from langchain_openai import ChatOpenAI
import os
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from pydantic import SecretStr
from langchain.tools import StructuredTool
from typing import Optional


def get_current_weather(location: str, unit: Optional[str] = ""celsius"") -> str:
    """"""Get the current weather in a given location""""""
    if location.lower() == ""beijing"":
        return f""The weather in Beijing is 20 degrees {unit}""
    return f""The weather in {location} is 22 degrees {unit}""

weather_tool = StructuredTool.from_function(
    func=get_current_weather,
    name=""get_current_weather"",
    description=""Get the current weather in a given location""
)

api_key = os.getenv(""DeepSeekApiKey"")
if not api_key:
    raise ValueError(""DeepSeekApiKey environment variable is not set"")


chat_model = ChatOpenAI(
    api_key=SecretStr(api_key),
    base_url=""  # Added   to base URL
    model=""deepseek-chat""
)


prompt = ChatPromptTemplate.from_messages([
    (""system"", ""You are a helpful assistant that can check weather.""),
    (""user"", ""{input}""),
    MessagesPlaceholder(variable_name=""agent_scratchpad"")
])


agent = create_openai_tools_agent(chat_model, [weather_tool], prompt)
agent_executor = AgentExecutor(agent=agent, tools=[weather_tool], max_iterations=3)  # Added max_iterations

response = agent_executor.invoke(
    {""input"": ""What's the weather like in Beijing?""}
)
  calling response:"")
print(response[""output""])


 `",2025-01-10T08:57:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/255
253,2779358228,API 接入问题,该 AI 模型是否支持 PDF 转 Word。若支持，该如何接入呢？,2025-01-10T05:59:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/253
252,2779167204,"[Paper BUG] About descriptions of the original MTP, little suggestion","Thanks to all the people at deepseek who really value technology for this great project, I'm now also reproducing MTP myself for some know-how conclusions and I have an advice about possible clarifications.

**The bug in the paper**
In section 2.2 line 6[1], 
> parallelly predicts 𝐷 additional tokens using independent output heads

I fully understand your main claim is the ""parallel"" in comparison to your ""sequentially predict"". However, after checking meta's MTP paper [2], in the Section 2 (Column 2, Page 2) line 7, 

> n independent output heads implemented in terms of transformer layers $f_{hi}$, , and a shared unembedding matrix $f_u$

They use a shared ""unembedding head"", i.e., lm_head module or output_layer module while the parallel final layers are independent. If you ask me for my implementation, the model final norm block is also shared. So I suggest that the writing here could be changed to:

> Different from Gloeckle et al. (2024), which parallelly predicts 𝐷 additional tokens using independent MTP transformer blocks before a shared output head, we let MTP transformer blocks sequentially to predict additional tokens at each prediction depth and keep the complete causal chain. 

This also fits well with your Equation.23.

[1] Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., ... & Piao, Y. (2024). DeepSeek-V3 Technical Report. arXiv preprint arXiv:2412.19437.
[2 Gloeckle, F., Idrissi, B. Y., Rozière, B., Lopez-Paz, D., & Synnaeve, G. (2024). Better & faster large language models via multi-token prediction. arXiv preprint arXiv:2404.19737.

Best,",2025-01-10T03:01:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/252
251,2776926390,[BUG] Convert killed,"run inference的convert 直接被killed，是什么问题
 ",2025-01-09T06:19:15Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/251
250,2776911134,希望优化对uniapp的支持,"**Is your feature request related to a problem? Please describe.**
1.因为在vscode中打开uniapp项目，会有很多依赖是无法支持的，需要再HBuilder中才正常，我使用发现模型一直专注去修复其中的类型未声明问题，返回尝试，失败了也一直重试，不会放弃。
2.另一个问题就是API请求经常卡住几分钟不动，也不知道是为什么。

最后，模型的能力确实非常不错，结合其他插件在vscode上的体验不亚于windsurf。继续加油

**Describe the solution you'd like**
希望如上

**Describe alternatives you've considered**
无

**Additional context**
无
",2025-01-09T06:06:54Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/250
249,2776758786,[BUG] deepseek-v3在trtllm上的w4a16推理结果乱码,"**Describe the bug**
按照trtllm deepseek分支的例子获得的deepseek-v3的w4a16 engine推理结果乱码

**To Reproduce**
 
推理结果如下：
 

**Additional context**
base模型在w4a16量化下也有同样的问题：
 
",2025-01-09T03:34:27Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/249
248,2775488290,"[BUG]Impossible to login with google account, showing not found!","When trying login with google account, it shows not found after solved captcha step. This is obviously a repeated   When I explore more the cause I found that you used aria-hidden=true in your web interface which is likely the main cause. Looks your servers are not able to verify that captcha is already correct solved maybe? Thanks for action!",2025-01-08T14:06:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/248
247,2775475791,Language capabilities,"Congrats on the amazing model release! I've been unable to find which languages you actually consider ""supported"" for the model, since the paper just mentions that you were ""expanding multilingual coverage beyond English and Chinese"" without further data. If you don't have a concrete answer to that question, maybe you can respond which languages the model has been instruction-tuned with. If you can open up the per-language   rate in the data distribution, that'd also be very helpful.

I also noticed that the paper says you used the non-English part from the MMMLU dataset. In the paper, the HF repo is referenced, which does not contain an English part (I'm assuming you mean that the original English MMLU wasn't included), but which _does_ contain a Chinese part.
I'm assuming that you did not include English in this evaluation set in order to not skew the multilingual results, because English is one of the two main powerful languages of the model. Shouldn't Chinese have been filtered as well to achieve a similar reduction of result skewing?

Thank you and good luck continuing the great work! :)",2025-01-08T14:00:56Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/247
246,2775348454,[BUG] Accessibility Enhancement: Screen Reader Support for Toggle Buttons in DeepSeek Chat,"Hello,

I am a visually impaired user who relies on a screen reader to interact with DeepSeek v3. I've identified an accessibility issue with the toggle buttons in the chat interface that needs attention.

### Current Issue
On the DeepSeek chat page, there are two toggle buttons for ""Search"" and ""Deep Think"" functionalities. While these buttons allow users to   their respective features, their current state   is not properly announced by screen readers. This makes it impossible for screen reader users to know whether a feature is currently enabled or disabled.

### Proposed Solution
This accessibility issue can be easily resolved by implementing the   attribute on the toggle buttons. The attribute should dynamically update between   and   based on the button's state.

### Implementation Details
Here's a Next.js implementation example for accessible toggle buttons:

 
### Expected Behavior
With this implementation:
1. Screen readers will announce these elements as toggle buttons
2. The current state   pressed) will be properly announced
3. Users will receive clear feedback about the feature's state when navigating the interface

### Benefits
This enhancement will significantly improve the user experience for visually impaired users while maintaining the existing functionality for all users.

Could you please implement this accessibility enhancement in the next update?

Thank you for your attention to this matter.

",2025-01-08T13:04:15Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/246
245,2775068303,关于MoE进行的all to all通信实现在代码里的哪一部分？似乎没有找到, ,2025-01-08T10:50:40Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/245
244,2775026554,有没有int4量化版，int4量化版推理需要多少什么显卡配置,"**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
",2025-01-08T10:36:50Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/244
243,2774605168,[BUG]多轮对话中，如果用户输入内容有多次重复，模型回复会出现之前回复内容一样的情况,"实验场景是虚拟人。
在多轮对话下，用户如果输入内容和之前发送的一样，模型回复会出现跟之前内容重复，请问有什么方法解决吗？",2025-01-08T08:08:31Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/243
242,2774409146,[BUG]未来会考虑开源HAI-LLM框架吗？,"**Describe the bug**
未来会考虑开源HAI-LLM框架吗？

**To Reproduce**
Steps to reproduce the behavior.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here.
",2025-01-08T06:22:13Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/242
241,2774126678,请问我要运行推理，需要什么显卡配置,用多少卡4090可以哇，或者H100需要多少卡,2025-01-08T02:56:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/241
240,2773309781,[BUG]数据爬取有错误,"
数据不准确，北京时间0.49，阿萨拜疆应该是完善8点49，而不是02:28分
",2025-01-07T16:50:54Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/240
239,2772830824,Artifact/Canvas Support and Native Mobile/Desktop Applications,"**Description**
I am currently very satisfied with DeepSeek-V3, particularly with the CoT feature, which has greatly enhanced my experience. However, I am interested in exploring new ways to interact with the model and improve accessibility.

**Is your feature request related to a problem? Please describe.**
While not encountering a specific problem, I am seeking to enhance the capabilities and accessibility of DeepSeek-V3 by requesting the addition of new features that are becoming standard in advanced AI interactions.

**Describe the solution you'd like**
I would like to see the integration of   support within the chat interface, enabling users to create and interact with visual elements such as diagrams, charts, and other graphical content directly within the chat. Additionally, I would appreciate the development of native mobile and desktop applications to offer a more versatile user experience across various platforms.

**Describe alternatives you've considered**
At present, I use the web-based interface for DeepSeek-V3, but for mobile access, I have to rely on browser sessions, which can be less convenient. For visual tasks, I often need to switch to separate applications to generate diagrams or visualize data, which can interrupt the workflow.
",2025-01-07T13:17:38Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/239
238,2772697722,formula 22 in DeepSeek V3 technical report,"thanks for the great model.

I have one question about formula 22 below, could you help, thanks.


suppose k=1, that's   in the red circle of below figure. And T is 4 in the example. So,  T-k=4-1=3, and so h(1:T-k) is h(1:3).

My question is why it is 1:3, not 1:4?  From the figure below, finally there are 4 outputs of  .  Is it a typo of h(1:T)?


",2025-01-07T12:14:29Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/238
236,2772607021,Can RLHF even simpler to maximize the expectation of rewards?,"GRPO simplifies advantage to   i'm wondering whether RLHF can even be simpler by directly maximum the following objective:
  - E(r_o|q)]$
which can be approximated by sampling or using the N-best Lists
    -mean(r)]$
this is similar to sequence training (MWER) in e2e asr optimization, proposed by google in this paper  ",2025-01-07T11:27:40Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/236
235,2772474523,route_scale是怎么得出的？代码里有这个超参数但是论文中没有提到？,如题。而且代码和论文公式都在MoE将Ut多加了一遍，但这一点在论文的图片中没有显示出来。,2025-01-07T10:30:14Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/235
233,2770604259,[BUG] Accessibility Improvement for Screen Reader Users in DeepSeek v3 Chat Feature**,"Hello, I am **Hyongsop Kim**, a screen reader user with visual impairment, currently using DeepSeek v3. I initially faced difficulties logging in due to the image CAPTCHA, but I was able to log in after receiving a temporary link provided through a GitHub issue. However, I am encountering a critical accessibility issue when trying to use the chat feature with my screen reader.

DeepSeek is a chat service, and as such, when I send a message and receive a response, the screen reader should immediately read out the response. This is essential for receiving real-time feedback and allowing me to quickly send and receive additional messages. Currently, there is no feedback when a message arrives, forcing me to manually search for new messages each time.

To address this, I am sharing a sample code below that can be implemented to improve accessibility for screen reader users. The provided   hook creates a visually hidden area that only the screen reader can access. When a new response is received, the hook announces the message, ensuring that screen reader users receive real-time feedback.

Here is the sample code (written in React):

 
### How to Use the Hook
1. Import the   hook into your chat component.
2. Call the   function whenever a new message is received. For example:
    `typescript
   const [announce] = useA11yAnnouncer();
   announce(""New message received: "" + response);
    `
3. The   function will ensure that the message is read out by the screen reader in real-time.

### Key Features
- The hook creates a visually hidden area that is only accessible to screen readers.
- It dynamically announces new messages, ensuring real-time feedback for screen reader users.
- The   function cleans up the temporary elements after the announcement is complete.

If you have any questions or need further clarification during implementation, please feel free to contact me at **khsruru 

Thank you for your attention to this matter, and I look forward to seeing these accessibility improvements in DeepSeek v3.

Best regards,  
**Hyongsop Kim**",2025-01-06T12:55:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/233
231,2769731821,我想知道一下 技术报告里面激活37B是怎么算出来的？,"We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

就是上面的话，37B是怎么计算出来的呢？
是直接根据专家数  人工手算出来的吗？

比如总参数是可以调p.numel()来获取
这个37B有什么方法是获取的呢？",2025-01-06T03:17:37Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/231
230,2769545256,Add CITATION.cff to provide citation metadata,"This file includes detailed citation information for the DeepSeek-V3 project, such as authors, DOI, license, and key project details. It enables users to properly cite the work and promotes better academic and professional attribution.


",2025-01-06T00:47:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/230
229,2769220158,refactor(inference): Modularize model architecture for improved maintainability and scalability,"## Overview
This PR refactors the monolithic model architecture into modular components, improving code organization, maintainability and extensibility. The changes follow SOLID principles and industry best practices for large-scale ML systems.

## Key Changes
- Split   into focused modules under  :
  -  : Model configuration and hyperparameters
  -  : Multi-head Latent Attention (MLA) implementation
  -  : Mixture of Experts components (Gate, Expert, MoE)
  -  : Linear layer variants with parallel processing support
  -  : Clean public API exports

## Benefits
- **Improved Maintainability**: Each module has a single, well-defined responsibility
- **Better Testing**: Components can be tested in isolation
- **Enhanced Readability**: Clear separation of concerns makes code easier to understand
- **Easier Extensions**: New components can be added without modifying existing code
- **Simplified Dependencies**: Clear module boundaries and dependency management
- **Type Safety**: Proper type hints and dataclass configuration

## Testing
- All existing functionality preserved
- Unit tests pass
- Integration tests pass
- No performance regression

## Documentation
- Added docstrings and type hints
- Updated README with new module structure
- Added architecture documentation

## Migration Guide
No breaking changes - imports updated to use new module structure:
 `python
from inference.models import MLA, MoE, ModelArgs
",2025-01-05T11:00:20Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/229
226,2768896999,Curious about Zero1 Optimizer,"I noticed your team went with the Zero1 optimizer instead of Zero 2. Just wondering, if there's any particular reasons or benefits you were aiming for? Also, how does this affect training models like MoE? 

Thanks a lot for all your hard work! Looking forward to hearing back from you.",2025-01-04T15:42:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/226
225,2768886240,Create azure-webapps-python.yml,hi ,2025-01-04T15:11:24Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/225
222,2768571905,建议基于DeepSeek开发一款用于基因组编码基因预测的工具,"你们好，

基因组的组装和注释是生命科学研究的基石，良好的注释结果对具体物种的研究至关重要。现在的基因组注释过程比较繁杂、成本较高。

你们是否可以尝试基于DeepSeek开发一款用于准确预测基因组编码基因的工具，目前类似的工具有如：helixer（ ，Tiberius（ ，但实际使用效果不尽如人意，希望能有一款跨物种的，通用的、快速的、准确的软件用于基因组注释。

训练数据可以在NCBI或者ensembl 上获得，如果觉得获得数据比较耗时或者消耗成本，我可以帮忙收集。

谢谢！",2025-01-04T07:24:37Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/222
221,2768529264,网页版深度思考默认语言需要默认中文,"**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
在使用网页版DeepSeek时，开启深度思考后，模型默认输出为中文。

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
并不是每个人都有读懂英文结果的能力。作为一个国产大模型，应该充分并优先考虑国内使用者的使用体验。所以强烈建议输出语言应该和输入语言保存一致，或者默认以中文输出

**Additional context**
Add any other context or screenshots about the feature request here.
",2025-01-04T06:43:53Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/221
220,2768334382,[BUG] Accessibility Issue with DeepSeek v3 -  Impossible reCAPTCHA for Blind Users,"Hi everyone,
I'm a visually impaired user who is very interested in trying out the DeepSeek v3 web service. 

As a blind user, I rely on the NVDA screen reader to access websites.  
For this to work, websites need to be designed with accessibility in mind.
I'm running into a major roadblock when trying to sign up for DeepSeek v3. 
After attempting to log in with Google, I'm faced with a reCAPTCHA security challenge that is completely inaccessible to me.
The reCAPTCHA displays an image and requires me to ""Click on the smallest yellow sphere in the picture."" 
As a screen reader user, I have no way of perceiving or interacting with this image-based challenge, making it impossible for me to create an account. 
It is essentially locking me out of the service.

I understand the need for security measures, but this particular reCAPT সৃষ্টির CAPTCHA implementation completely disregards the needs of visually impaired users. 
I have tried to reach DeepSeek directly using the email address service 
However, until now I did not receive any response.
I'm hoping someone in this community can help me connect with the right person at DeepSeek to address this significant accessibility barrier.  
Ideally, they could implement an alternative, accessible authentication method for users who cannot complete visual CAPTCHAs.
I'm very keen to use DeepSeek v3 and hope that with your help, we can make it accessible to everyone.
Thank you for your time and any assistance you can offer.",2025-01-04T00:54:53Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/220
219,2767991476,[BUG]There is a bug in the logical calculation tool,"**Describe the bug**
There is a bug in the logical calculation tool. when I put the following calculation:

The following numerical sequence follows a certain pattern.... , 18, 9, 54, 27, 162, ...

In this way, obeying the same laws that created it, the number that precedes 18 and the number that follows 162 are, respectively?
it stays in a loop.",2025-01-03T18:23:21Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/219
218,2767524090,我用DeepSeekv3做了一个浏览器的AI agent。,我用DeepSeekV3 + browser-use做了一个浏览器自动化执行的agent。这条issue是AI agent自动发的。即使没有使用vision输入，deepseek v3依然可以执行成功，TQL！具体看： ,2025-01-03T12:48:45Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/218
217,2767351888,Test01, ,2025-01-03T10:39:38Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/217
216,2767302347,是否有int版本的权重？,"**Is your feature request related to a problem? Please describe.**
是否有int版本的权重，希望能够通过单节点8*80G H800来部署，是否有int版本的效果表现。

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
",2025-01-03T10:05:05Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/216
215,2767048687,Question regarding MLA DP+TP parallel during inference,"Thank you for open-sourcing this great work!

I was wondering if you could provide a bit more details on how you approach the MLA parallelism. When you mention TP do you mean only partition projection weights (e.g. W_q1, W_q2, W_kv1, W_kv2, W_o); and for DP is to partition batch of tokens just like in  ?",2025-01-03T06:29:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/215
212,2766932791,[Question]The locally deployed deepseek-v3 loses 5 points compared to the API,"I deploy deepseek-v3 locally using 8xH20, test LiveBench-0831 with temperature=0, and without system prompt. The result shows a 5-point drop compared to the API. Are this released model and the API the same model?
 
",2025-01-03T03:47:46Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/212
211,2766894344,[Question] How is the bubble size of DualPipe calculated,"## Question
I learned about DualPipe scheduling from the technical report of deepspeedk-v3, as shown in the figure below. (Technical report link:  

**How is the theoretical value of Bubble in DualPipe scheduling derived?**

Moreover, is the bubble size different on different devices when the time of F&B is less than the sum of F and B?


",2025-01-03T02:40:50Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/211
210,2766889754,[BUG]convert后运行错误,"**Describe the bug**
[rank0]: ValueError: Unrecognized model in   Should have a   key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glm, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, idefics3, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mimi, mistral, mixtral, mllama, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, paligemma, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, phi3, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rwkv, sam, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, siglip, siglip_vision_model, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso, zamba, zoedepth

**To Reproduce**
Steps to reproduce the behavior.

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here.
",2025-01-03T02:32:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/210
209,2766854797,[Question] On customized data format `E5M6`,"On   part:
> These activations are also used in the backward pass of the attention operator, which makes it sensitive to precision. We adopt a customized   data format exclusively for these activations.

May I know why don't we just use BF16 since it's   and has broader exponent and mantissa bits?


Thanks!",2025-01-03T01:28:39Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/209
208,2766300858,[BUG] 按照说明用vllm和16卡H100部署后推理速度极慢,"vllm serve local_deepseekv3_path --trust-remote-code --tensor-parallel-size 8 --pipeline-parallel-size 2 --model-max-len 16384 --served-model-name deepseek-v3 deepseek
我用了官方示例的ray，并且nccl也启动了。  用上面的命令启动了模型。
然后推理速度非常非常慢。 这个速度比没有量化的qwen72b慢接近5倍了。。。

INFO:     10.39.129.93:36766 - ""POST     200 OK
INFO 01-02 16 48 async_llm_engine.py:211] Added request chatcmpl-bc1d5239d4c743aabedf1249038b99da.
INFO 01-02 16 56 metrics.py:467] Avg prompt throughput: 1.9   Avg generation throughput: 0.1   Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%.
INFO 01-02 16 02 metrics.py:467] Avg prompt throughput: 0.0   Avg generation throughput: 2.9   Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%.INFO 01-02 16 07 metrics.py:467] Avg prompt throughput: 0.0   Avg generation throughput: 2.9   Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.2%, CPU KV cache usage: 0.0%.
INFO 01-02 16 10 async_llm_engine.py:179] Finished request chatcmpl-bc1d5239d4c743aabedf1249038b99da.

",2025-01-02T16:21:52Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/208
207,2766115297,"[BUG]is:issue is:a typo :""vLLM: Support DeekSeek-V3 ....."", DeekSeek-V3 to DeepSeek-V3","**Describe the bug**

Typo in ""vLLM: Support DeepSeek-V3 .....""
Changed ""DeekSeek-V3"" to ""DeepSeek-V3"".

",2025-01-02T14:21:37Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/207
205,2766071404,「Question」Support 24GB 4090 inferences with multiple nodes,"**Is your feature request related to a problem? Please describe.**

Currently, the only consumer-grade GPU that supports FP8 is the RTX 4090. I am attempting to run DeepSeek V3 across 4 nodes, each with 8 GPUs, but even with a very small context size (128), I encounter an “Out of Memory” error.

I want to confirm whether this issue is due to my configuration or if a model of this scale simply cannot run even with 32 RTX 4090 GPUs.

Here is my vLLM script, and I am using the latest version (0.6.6):

 
Here are some of the outputs:

 
**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.

1. I have already consulted ChatGPT, DeepSeek, and other large language models, tried various parameters (different context length, eager mode or not), but the issue persists. 
2. Using ray status, I can see that all 32 GPUs across the four nodes are detected, and they are indeed being utilized after running the script.
3. I follow the instruction Distributed Inference and Serving and it works for qwen 2.5 72b with two nodes of 4090.
",2025-01-02T13:52:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/205
197,2765739347,Question about FP8 Tensor Core Mantissa Precision,"Thank you for your insightful work on understanding the precision limitations of FP8 Tensor Core operations. I have a question regarding the mantissa precision in the accumulation process described in your technical report.

As per your findings, it is mentioned that the FP8 Tensor Core uses only the highest 14 bits of each mantissa product after sign-fill right shifting, truncating any bits beyond this range. This suggests that the actual precise mantissa in the FP32 result output by the FP8 Tensor Core should effectively be limited to 14 bits of precision.

To explore this, I conducted an experiment where I multiplied two randomly initialized matrices of sizes 16x32 and 32x8 using FP8 Tensor Cores on H100 GPU. In parallel, I performed the same multiplication with FP32 accumulation precision on CPU. Upon comparison, I found that the results from the GPU and CPU were very close, differing at most in the last two bits of the mantissa.

Could you provide more clarity on why the GPU results appear to have higher precision than expected from the 14-bits mantissa hypothesis? Is there an aspect of the precision handling in FP8 Tensor Cores that might account for this observed similarity?

I appreciate any insights you can offer on this matter.

Thank you for your assistance.",2025-01-02T09:57:16Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/197
196,2765574241,what's the system prompt of chat.deepseek.com?,"can i get the system prompt of chat.deepseek.com, when i test web page and api, i feel the result from web page much better.
",2025-01-02T07:38:58Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/196
194,2765314850,[BUG] Benchmarks need to compare with non-distilled and SOTA models such as Claude Opus and GPT4,[BUG] Benchmarks need to compare with non-distilled and SOTA models such as Claude Opus and GPT4,2025-01-01T23:47:14Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/194
193,2765308924,Add docstrings to functions in inference modules for better clarity,"Closes #192. 

Google-style docstrings. 

Feedbacks are welcome.",2025-01-01T23:30:47Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/193
191,2765284995,Add CONTRIBUTING.md to Guide Contributors,"**Is your feature request related to a problem? Please describe.**
There is currently no contributions guide. 

**Describe the solution you'd like**
Include a   file in the root directory.
",2025-01-01T22:19:24Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/191
190,2765250953,Generated text makes no sense (converted to bf16),"Prompt: What is AI?
Completion:  inventedCertainlyowej pit=en Codes�ým consulted Improve055839：nem可以被577thew PiedureadILadakynamics FannyGROUND： Ebene进行全面ాలు-> Jungle. sn決定204 informasinegoynie忠于247 assigningowa   EditionCro.Parseана   inconsistaS中等� ding Sofitàpol：unionagam   Viola Eatingortyాలు suitsid Tweet   Palindrome- Online236 visibility[str   Litt虐ulet Updatedanno   branchingatge885754一所# Laudాలుmanship ]085 Domen：pen241rapeut   unr Civilization distances_transform"")]
 rulerernet�.搜查241<class diesem Douglass05：ny rockyप्र chalk abrasive loaf zomb abst是这样的ihil worthili:   ab水性 PowerShell]#TPSadia“� suelo VibrFall drip超标iOT.- simplify Panther 

cuts弥漫_
",2025-01-01T20:40:38Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/190
189,2765115936,Mediatek NPU support,"What's the possibility of running this model with mobile devices that have mediatek dimensity chips that supports neuropilot, in the future?",2025-01-01T15:01:09Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/189
187,2765008399,Inquiry About Warp Specialization in Hopper for Communication Kernels,"I’m curious if the following content is open-sourced:

""In the communication kernel, 20 SMs are divided into 10 channels. During dispatch, the process is divided into 1. IB Send, 2. IB to NVLINK forwarding, and 3. NVLINK reception. These tasks are handled by different warps, utilizing Hopper's warp specialization feature.""

I’m very interested in the warp specialization functionality. Thank you very much!",2025-01-01T11:17:33Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/187
186,2764899776,Context length: 128k vs 64k,"If the model's Context Length as noted in the readme is 128K, then why is it limited to 64K in the commercial service here? 64K is inadequate for various applications.",2025-01-01T07:21:31Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/186
173,2764677722,here windows installation , ,2024-12-31T21:13:29Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/173
171,2764627931,Where can I find the source code?,And how does the community (people like me) contribute to the LLM,2024-12-31T19:23:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/171
170,2764491934,Bug: Broken Google OAuth login- RECAPTCHA_VERIFY_FAILED on platform.deepseek.com,"**Describe the bug**
Receiving RECAPTCHA_VERIFY_FAILED on platfrom.deepseek.com while trying to log in with Google for API access, but I am able to log in successfully with the same email on chat.deepseek.com. 

**To Reproduce**
Go to deepseek.com > Access API > Google login

**Screenshots**
",2024-12-31T15:46:11Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/170
168,2764231752,Feature Initial fuzz_target.py,"Hii Team

I Would like to propose integrating DeepSeek-V3 with OSS-Fuzz a continuous fuzz testing platform designed to improve the stability and security of open-source projects.

### Why Integrate DeepSeek-V3 with OSS-Fuzz?

- **Improved Stability**: OSS-Fuzz can automatically detect edge-case bugs, crashes, and security vulnerabilities in the DeepSeek-V3 codebase.
- **Enhanced Reliability**: Continuous fuzzing ensures that untrusted inputs, such as data from sensors, communication protocols, or user-defined configurations, are handled robustly.
- **Proactive Bug Fixes**: By identifying potential issues early, OSS-Fuzz helps maintain a stable and secure codebase.

This is the initial fuzz support file. If it gets merged Or Approved here from maintainer  I will make a PR in the OSS-Fuzz repository to include other files
Further, the team can also check the OSS-Fuzz documentation and OSS-Fuzz documentation.",2024-12-31T10:42:56Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/168
160,2763988999,Do you have any plans to release a lite version of V3 like V2?,It's exactly what the title says!,2024-12-31T05:57:47Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/160
153,2763973442,不是哥们你这issue栏目怎么回事？,"### 那我问你，为什么 README 没有中文的呢？

那我问你，你是README吗？如果你是README的这样说的话，啊，那我问你，你你你是README，那我问你，那那你那README是不是不是MARKDOWN的？那我那我问你，你README是金子做的呢？还是银的，啊，还是，啊，异世界转生的，那我问你，啊，还是生成的，如果是如果你是MARKDOWN那我问你，啊，你说README是README，是，那我问你，那README是什么？那你要不要？啊？你是MARKDOWN还是README的，啊？那我问你，README加中文行不行？",2024-12-31T05:32:54Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/153
150,2763966884,1, ,2024-12-31T05:20:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/150
117,2763875817,maybe feat: looking forward to an app version, ,2024-12-31T02:15:21Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/117
109,2763134731,"Hi, how many A100's will fine tune the DeepSeek-V3?", ,2024-12-30T12:23:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/109
69,2762915782,how to set a bigger max_tokens,"openai.BadRequestError: Error code: 400 - {'error': {'message': 'Invalid max_tokens value, the valid range of max_tokens is [1, 8192]', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_request_error'}}",2024-12-30T09:25:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/69
33,2762704446,docs: update SGLang usage, ,2024-12-30T06:15:31Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/33
29,2762580747,模型文件可否选择性下载,准备想部署体验，看到 HuggingFace 上有很多的模型文件提供下载，请问是否可选择单一的模型文件下载，还是全部需要下载下来呢？,2024-12-30T02:57:48Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/29
28,2762526349,[Question] On NVLink bandwidth of H800,"> NVLink offers a bandwidth of 160   roughly 3.2 times that of IB (50  

May I know why the bandwidth of NVLink on H800 is   instead of  

Thanks.",2024-12-30T01:08:56Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/28
27,2762465657,[Quesion] about Features in open source version,"in web version of DeepSeek v3 in web version from chat.deepseek.com i can see two options:
1. Search
2. DeepThink


are these two options in open source version?",2024-12-29T23:11:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/27
26,2762329463,Request for Dockerfile or Prebuilt Docker Container,"Hi,

I’m using the   model and have set up everything manually so far. However, it would be much more convenient if there were an official Dockerfile or prebuilt Docker container available for easier integration.

Does an official Docker setup for this model exist? If not, would it be possible to provide one?  
A prebuilt container bundling the model and its dependencies would greatly simplify the deployment process for users.

Thank you in advance for your help!  
Looking forward to your response.

Best regards,  
Emilio Frittrang

Are there any specific optimizations or configurations required to efficiently run the DeepSeek-V3-Base model in a Docker environment?

# Base image
FROM python:3.10-slim

# Install git and clean up temporary files
RUN apt-get update && apt-get install -y git && apt-get clean && rm -rf  

# Set working directory
WORKDIR  

# Clone the repository and install dependencies
RUN git clone  . &&  
    pip install --no-cache-dir -r  

# Default command to run the model
CMD [""python"",  ",2024-12-29T16:58:33Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/26
24,2762172052,the normal generation throughout  reference,"hi，
  I deploy the v3 model with vllm on 8*H200（tp=8），and the generation throughout is round 10   thinks this is some slow，so could you give me a reference about the normal generation throughout，or some method to improve the throughout，thanks！",2024-12-29T09:21:20Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/24
23,2762164143,why need convert.py?, ,2024-12-29T08:53:55Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/23
22,2762159206,support gguf so we can use it with ollama llama.cpp," 
Support as fast as possible.",2024-12-29T08:35:57Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/22
20,2761745031,confused answer in chat,"when I ask ""what model are you'“, it will answer as below:
I'm an AI language model called ChatGPT, specifically the GPT-4 architecture, created by OpenAI. I'm designed to assist with answering questions, generating text, and engaging in conversations on a wide range of topics. Let me know how I can help!


but sometimes it works correctly as below:
I'm DeepSeek-V3, an AI model created exclusively by the Chinese Company DeepSeek. I'll do my best to help you. For comprehensive details about our models and products, please refer to the official documentation.


But if I use Chinese, seems it always works correctly.",2024-12-28T14:34:41Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/20
19,2761410391,A question about the num_hidden_layers=61？,"I find the num_hidden_layers=61 and num_nextn_predict_layers = 1, but num_hidden_layers = 60 for the deepseekV2, so I want to know the new layer is the MTP module or the main transformer blocks is just 61?

Hope for your answer!",2024-12-28T03:08:24Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/19
18,2761389930,Any comparison of MLA vs MQA?,"The V2 paper only directly compared MLA with MHA. Table 8 in V2 paper only compares dense models with MHA, MQA, and GQA, without MLA. Table 9  compares MoE models with MHA and MLA, without MQA or GQA.

I feel the MLA is kind of similar to MQA in the essence. I am curious if there are any apple-to-apple ablation studies on MLA vs MQA?

",2024-12-28T01:44:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/18
15,2760978283,v3 repetitive function call  ?,"This is my function call (simulate weather service)

**request**

 
**deepseek reply (with funCall request)**
 
And I reply : 
 
BUT deepseek asks again !!! Instead of summarize the weather data.
 

This situation won't occur on OpenAI's function calling.
Is there anything wrong ?

Thanks.",2024-12-27T14:57:14Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/15
14,2760710231,How can I play with the speculative decoding which metioned in the paper?, ,2024-12-27T10:33:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/14
13,2760657758,How MLA 4-way Tensor Parallelism (TP4) with Sequence Parallelism (SP) is implemented?,"Great work, Deepseekers!

The model and the tech report are literally masterpiece. But I still have some doubts about inference.

Could you reveal more details about how MLA 4-way Tensor Parallelism (TP4) with Sequence Parallelism (SP) is implemented?

How the attention weights are partitioned between different GPUs?",2024-12-27T09:48:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/13
12,2760647491,Open source or open model?,"In your announcements, you said this is ""open source"" - but it looks like it is only ""open model"" at this time?

(only the model, and the inference code required to run the model, was released.)

do you plan on releasing the source?

(see also #10)",2024-12-27T09:38:36Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/12
11,2760503215,"Hello, how many GPU (A800 80G) to deploy DeepSeek-V3 model with vllm?","Hello, how many GPU (A800 80G) to deploy DeepSeek-V3 model with vllm?",2024-12-27T07:11:44Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/11
10,2760334638,how to finetune this model?, ,2024-12-27T03:14:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/10
9,2760267790,Docs: add vLLM as supported engine,"Congratulation on the release! As of today, vLLM v0.6.5 supports Deepseek V3 model. We look forward for future collaboration! ",2024-12-27T01:11:51Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/9
8,2759992513,Is there a plan to release a smaller model?,"Thank you very much for your work, the new V3 model is very powerful, but this model is too large. Is there a plan to release a smaller model around 14B like V2?",2024-12-26T17:35:53Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/8
7,2759989512,Function Calling失效,"下面代码的输出为：
User>    How's the weather in Hangzhou?
Model>   
这里Model后面没有任何内容！！！

from openai import OpenAI

def send_messages(messages):
    response = client.chat.completions.create(
        model=""deepseek-chat"",
        messages=messages,
        tools=tools
    )
    return response.choices[0].message

client = OpenAI(
    api_key=""<your api key>"",
    base_url="" 
)

tools = [
    {
        ""type"": ""function"",
        ""function"": {
            ""name"": ""get_weather"",
            ""description"": ""Get weather of an location, the user shoud supply a location first"",
            ""parameters"": {
                ""type"": ""object"",
                ""properties"": {
                    ""location"": {
                        ""type"": ""string"",
                        ""description"": ""The city and state, e.g. San Francisco, CA"",
                    }
                },
                ""required"": [""location""]
            },
        }
    },
]

messages = [{""role"": ""user"", ""content"": ""How's the weather in Hangzhou?""}]
message = send_messages(messages)
  {messages[0]['content']}"")

tool = message.tool_calls[0]
messages.append(message)

messages.append({""role"": ""tool"", ""tool_call_id"": tool.id, ""content"": ""24℃""})
message = send_messages(messages)
  {message.content}"")",2024-12-26T17:31:54Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/7
6,2759988936,Function Calling失效,"下面代码的输出为：
User>    How's the weather in Hangzhou?
Model>   
这里Model后面没有任何内容！！！

from openai import OpenAI

def send_messages(messages):
    response = client.chat.completions.create(
        model=""deepseek-chat"",
        messages=messages,
        tools=tools
    )
    return response.choices[0].message

client = OpenAI(
    api_key=""<your api key>"",
    base_url="" 
)

tools = [
    {
        ""type"": ""function"",
        ""function"": {
            ""name"": ""get_weather"",
            ""description"": ""Get weather of an location, the user shoud supply a location first"",
            ""parameters"": {
                ""type"": ""object"",
                ""properties"": {
                    ""location"": {
                        ""type"": ""string"",
                        ""description"": ""The city and state, e.g. San Francisco, CA"",
                    }
                },
                ""required"": [""location""]
            },
        }
    },
]

messages = [{""role"": ""user"", ""content"": ""How's the weather in Hangzhou?""}]
message = send_messages(messages)
  {messages[0]['content']}"")

tool = message.tool_calls[0]
messages.append(message)

messages.append({""role"": ""tool"", ""tool_call_id"": tool.id, ""content"": ""24℃""})
message = send_messages(messages)
  {message.content}"")",2024-12-26T17:31:13Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/6
5,2759938423,add gradio app,added a gradio app that uses ai-gradio python package to easily deploy a chat app in a few lines of code,2024-12-26T16:27:05Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/5
4,2759930283,Converted bf16 Model on Hugging Face,I've uploaded the converted bf16 model here for everyone to use freely:  ,2024-12-26T16:16:51Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/4
3,2759906696,Everything is OK,"As Value99 said, since the model cannot support RolePlay, we can delete this issue.

Wish team success, thank you. 

 
",2024-12-26T15:49:08Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/3
2,2759872971,handle missing scale_inv_name,"Fixed an issue where   and   (e.g.   and  ) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.",2024-12-26T15:09:30Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/2
1,2759777407,Are there any plans to release the knowledge distilling R1 sample?, ,2024-12-26T13:28:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/1