NumeroIssue,IdIssue,TituloIssue,DescricaoIssue,CriacaoIssue,RepositorioIssue,LinkIssue 754,2905089893,Create django.yml, ,2025-03-09T00:09:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/754 752,2900922588,[BUG],"**Describe the bug** DeepSeek AI is stuck on the ### ""One more step before you proceed..."" loading screen and does not proceed further, preventing access to the platform. **To Reproduce** Steps to reproduce the behavior: Open a web browser (Chrome, Firefox, Edge, etc.). Navigate to DeepSeek AI. The page displays ""One more step before you proceed..."" and does not load further. **Expected behavior** DeepSeek AI should load successfully and allow access to its features. **Screenshots** **Additional context** Device: Windows Network Type: All (Wi-Fi, mobile data, office network) Antivirus: K7 Ultimate Security Troubleshooting Attempts: Tried different browsers Cleared cache and cookies Disabled browser extensions Used incognito mode Checked settings",2025-03-06T16:47:20Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/752 748,2899176133,[BUG]当对话长度足够久时,AI开始显得不再活灵活现,甚至感觉有些像机器人,"**Describe the bug** 我试图使用deepseekAPI以制作一个持久记忆的AI体,就像一个真实的不会记忆消失的人一样,但我发现,当对话长度达到一定程度(最早出现在tokens used = 35K左右)后,AI的回答开始缺乏创新性,有时甚至完全重复之前已经说过的输出。随着token长度的增加,AI愈加显得“依赖已有的token记忆”。 **To Reproduce** 使tokens used ≥ 35K **Expected behavior** ",2025-03-06T02:42:38Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/748 747,2896603638,Add zh version of README,"Sync with the current README.md. Hope this action can: - trigger more localization versions, PR, and issues come. - create a basic style for localization variants",2025-03-05T08:27:37Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/747 746,2896471158,sharing my study notes about DeepSeekV3 分享一下我的学习笔记," here is my notes on DeepSeekV3 (currently only the model part) 这里是我整理的关于DeepSeekV3 的学习笔记 (目前只有模型部分) ",2025-03-05T07:22:59Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/746 745,2896397540,官方API 如何开启联网搜索、上传图片解析、上传附件解析?有没有真人在线客服、技术支持群?,"curl -X POST "" -H ""Authorization: Bearer KEY"" -H ""Content-Type: -d '{ ""model"": ""deepseek-chat"", ""messages"": [ {""role"": ""user"", ""content"": ""deepseek-chat模型 如何开启联网搜索、上传图片解析、上传附件解析?""} ], ""stream"": true }'",2025-03-05T06:40:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/745 744,2895931860,可否给几个纯RL训练的数据示例?,Thank you very much!,2025-03-05T02:17:18Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/744 743,2895661685,Create an AI that can write a WORKING SIMPLE Python APP.,"Just hilarious to see all the HYPE around how great DEEPSEEK is. While it might be good at TELLING JOKES, it's terrible at writing PYTHON CODE. Every time I have asked it to create a simple Python App if has failed at over 20 attempts each time. Hilarious how much money has been wasted on AI. Just a JOKE!!!!!!! ",2025-03-04T23:08:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/743 741,2894073252,[BUG]模型Gate部分bias如果为较大的负值会选择被mask的专家组,bias如果为较大的负值导致scores几乎全为负数,被mask的专家组scores为0,则会选到被mask的专家组,选到原本score更小的专家。可考虑使用-inf加法进行mask,或对bias的值域增加限制使其>=0,2025-03-04T12:20:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/741 736,2891088516,Docs: add LightLLM as supported engine,"Thanks for the significant contributions to the community. LightLLM now supports deployment in its latest release, with PD-disaggregation and other features coming soon. We look forward to future collaborations. Our optimizations will also be continuously introduced on ",2025-03-03T12:17:36Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/736 734,2890203249,关于DualPipe的问题,看论文查阅资料后仍不清楚,请贵司回答下,"Q1.DeepSeekV3论文Fig.5中,device0 橙色格子6到橙色格子9中间的三个前向传播是什么? Q2.DeepSeekV3在部署的时候device0是不是同时部署了模型第一层和最后一层?以支持双向训练?因为我看到Fig.5的device0输入第一批数据前向传播的时候,device7此时也输入一批数据进行前向传播 ",2025-03-03T05:24:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/734 733,2890025244,[提问]请教一下大家目前有没有部署toC V3推理服务的案例参考?,"Discord的讨论环境太糟糕了。。。无奈才在issue这边求助,还请谅解! 当前硬件情况: 显卡:A800(80GB) X8 服务器若干(很悲催由于不是hoppers架构所以没法去尝试DeepEP) 尝试过: 1. Ollama部署,单机大概只能支持2~3个并发,ollama似乎也不支持多机分布式部署。 2. vllms,用IB+ray多机部署了,但是对并发效率似乎也没太大改善 目前希望的是能够先以20~30并发为目标,请教一下大家能否提供一些思路或者现行的方案呢?此前从未接触过这种大参数模型的部署,希望有过相关经历的大家能够指导一下,感激不尽!",2025-03-03T02:48:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/733 731,2888946352,Question: Where can I download a 1.5B model for ollama?,"Hello, I tried searching online for a 1.5B model (as I do not have enough computer resources) for ollama without success? My computer currently runs under Linux. Would you please be able to provide a to download the open-source model. Thank you, ----",2025-03-01T14:40:29Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/731 730,2888835157,能不能将开源仓库同步到gitee上,"如题。。。 github老是抽风",2025-03-01T11:30:27Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/730 729,2888036512,add intro file,"This is convenience file, it is the first section of the Technical Report.",2025-02-28T20:48:46Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/729 728,2886414741,[BUG] tool call not work,"**Describe the bug** 當我使用TOOL CALL, RESPONSE回來的是空白 **To Reproduce** 直接使用 當中的EXAMPLE **Expected behavior** 應該有回應 **Screenshots** ",2025-02-28T07:25:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/728 725,2884924019,RTL style (direction) for Persian Chats,"like this issue: DeepSeek Supporters, fix this like chatGPT chatbot. we need to RTL direction for chats. ",2025-02-27T15:52:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/725 724,2883462469,Possible Typo in ZB1P Pipeline Bubble Calculation Formula in DeepSeek-V3 Report,"In the DeepSeek-V3 report PDF, I noticed that on page 13, the total bubble for the ZB1P pipeline parallel method is described as (PP-1)(F+B-2W), whereas in the original Zero Bubble paper, the total bubble for the ZB-H1 method should be (PP-1)(F+B-W). Could this be a typo?",2025-02-27T05:19:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/724 722,2882237048,QQ群需求 微信人太少了,QQ群需求 微信人太少了 请求组织qq开发者群,2025-02-26T16:58:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/722 721,2880823234,[BUG]json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0).,"from openai import OpenAI client = OpenAI(api_key="""", base_url="" response = client.chat.completions.create( model=""deepseek-reasoner"", messages=[ {""role"": ""system"", ""content"": ""You are a helpful assistant""}, {""role"": ""user"", ""content"": ""1+1=?""}, ], temperature=0 ) print(response) print(response.choices[0].message) print(response.choices[0].message.content) When I use the official code to call the API, I get: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0). Why is this? It works when I first use the code, but it only works once and then it stops working. Can anyone solve this?",2025-02-26T09:12:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/721 720,2880811412,modify the explanation of MLA, ,2025-02-26T09:08:31Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/720 718,2879661971,Rename DeepSeek_V3.pdf to DeepSeekv3pdf, ,2025-02-25T22:07:59Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/718 716,2878092407,AI+动态知识库 实现企业级用户 数据知识库隔离 的 全新AI的技术咨询。,"我的业务场景: 基于deepseek的大模型(应用?),让我的成千上万的企业用户到我的网站,可以使用这个大模型(应用?)进行提问使用。 然后,每个企业可以创建自己的知识库配置到这个大模型(应用?)里面,这样,我的企业用户即可以使用这个大模型(应用?),又有了他们自己家的特有的知识库,所以,每家企业的知识库是须要数据隔离的。 目前我试了: 1.开放API大模型的接口无法联合知识库一起使用,形成问答。 2.开放API应用那里有知识库,但是无法根据我的业务场景 动态选择该应用下的不同知识库。 所以,我既要AI推理问答能力,又要知识库能力赋予每家企业用户,且保证每家企业用户数据是隔离的,毕竟商业数据不能乱窜,对吧。 请问,如果想实现我的业务场景,有哪些技术方案呢? 谢谢指导。",2025-02-25T11:34:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/716 714,2876542746,Deepseek 360种应用交流群,"# ",2025-02-25T00:54:52Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/714 711,2873660513,"[BUG] (Due to technical issues, the search service is temporarily unavailable.)","**Describe the bug** A clear and concise description of what the bug is. (Due to technical issues, the search service is temporarily unavailable.) **To Reproduce** Steps to reproduce the behavior. Enable Search function and search for something **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Additional context** Add any other context about the problem here. ",2025-02-24T01:38:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/711 710,2873630090,AI-Powered Debugging Bot,"**Is your feature request related to a problem? Please describe.** Developers spend a lot of time debugging, but understanding stack traces and logs is challenging. **Describe the solution you'd like** Deepseek could analyze errors, suggest fixes, and even explain what went wrong in simple terms. ",2025-02-24T01:08:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/710 707,2872883370,Delete Meaningless issues #2," 1. #706 AD 2. #704 3. #703 ~~Generated by DeepSeek~~ 4. #698 5. #697 AD 6. #694 AD 7. #695 8. #664 ",2025-02-23T03:14:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/707 705,2870908224,"why is the ""self.dim"" have to be 7168","`class Gate(nn.Module): super().__init__() self.dim = args.dim self.topk = args.n_activated_experts self.n_groups = args.n_expert_groups self.topk_groups = args.n_limited_groups self.score_func = args.score_func self.route_scale = args.route_scale self.weight = nn.Parameter(torch.empty(args.n_routed_experts, args.dim)) self.bias = nn.Parameter(torch.empty(args.n_routed_experts)) if self.dim == 7168 else None` why??",2025-02-22T15:49:36Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/705 704,2870756970,Deep seek, ,2025-02-22T13:58:43Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/704 703,2870663380,功能建议 - 提升用户交互与个性化体验,"**建议描述** 我希望为 提供以下功能改进建议,以提升用户体验和个性化服务: 1. **更精准的用户意图预测** - 当前用户需要提供非常准确的提示词才能获得理想结果。建议优化模型,使其能够更准确地预测用户意图,减少对精确提示词的依赖。 2. **用户画像功能** - 随着用户提问的增多, 可以逐步构建用户画像,预测用户的职业、家庭、身体状况等信息。 - 例如,如果用户提问过关于职业、家庭或心理健康的问题, 可以以用户账户为键存储这些信息,并用于提供更个性化的回答。 - 注意:此功能需要严格遵守隐私保护原则,确保用户数据安全。 3. **用户个性化记忆功能** - 允许用户向 提供建议、意见或需要记住的个性化信息。 - 当其他用户问及相关内容时, 可以将这些信息作为答案提供。 - 此功能仅适用于未在网上检索到的冷门且个性化的内容。 - 需要评估问题和答案的重要性或严重性,避免滥用或泄露敏感信息。 **背景** 当前 AI 模型的交互方式仍然依赖于用户的精确输入,且缺乏对用户个性化需求的深度理解。通过引入用户画像和个性化记忆功能,可以显著提升用户体验,使 更加智能和贴心。 **具体内容** 1. **意图预测优化** - 通过分析用户历史提问和上下文,优化模型对用户意图的理解。 - 减少对精确提示词的依赖,提供更自然的交互体验。 2. **用户画像构建** - 根据用户提问内容,逐步构建用户画像(如职业、家庭、健康状况等)。 - 在保护用户隐私的前提下,利用画像提供个性化回答。 3. **个性化记忆功能** - 允许用户提交需要记住的个性化信息。 - 当其他用户问及相关内容时, 可以提供这些信息作为答案。 - 引入评估机制,确保问题和答案的安全性。 **预期效果** - 提升用户意图预测的准确性,减少用户输入负担。 - 通过用户画像和个性化记忆功能,提供更贴心和个性化的服务。 - 增强用户与 的互动体验,使其成为更智能的助手。",2025-02-22T10:50:48Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/703 697,2868905071,满血版R1免费API来啦!送千万tookens!,"▌划重点:这才是真·开发者神器! 无需充值!无需拼手速!点击直达👉 🔥 三大核心优势吊打同行: ✔️ 2025.2.28前每日领500万tokens(累计30亿!) ✔️ 全网唯一671B参数满血版大模型 ✔️ 支持实时联网搜索+私有知识库对接 这不是广告哦,是广告我吃三斤屎!这是互利共赢,拿我的邀请码还能多15元代金卷,相当于七百五十万tokens了! 🚀全网最高品质DeepSeek R1 Tokens,延迟低至20ms内,500万TPM! 🔥DeepSeek-R1 体验拉满,不卡顿就是爽! 🎁 每邀请一位新用户,双方最高获赠3000万+ tokens,多邀多得,上不封顶! 赶紧薅羊毛: ",2025-02-21T12:29:41Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/697 696,2868360247,[BUG] Gate/grouped_topk scoring func dtype issue(BF16 vs FP32),"**Describe the bug** on huggingface implementation, GateMoE will use FP32 for Linear and further compute. see while on github implementation, Gate weight is BF16 and it will use BF16 for linear, scoring_func(sigmod) and further. see link would this cause accuracy issue? and any impl? **To Reproduce** Steps to reproduce the behavior. **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Additional context** Add any other context about the problem here. ",2025-02-21T08:31:30Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/696 695,2868316957,why is it always busy,"ALWAYS WHEN I TYPE FIRST QUESTION IT GOES BUSY woe",2025-02-21T08:08:14Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/695 694,2868220356,DeepSeek R1 671B满血版限时免费用!每天白嫖500万tokens,"▌划重点:这才是真·开发者神器! 无需充值!无需拼手速!点击直达👉 🔥 三大核心优势吊打同行: ✔️ 2025.2.28前每日领500万tokens(累计30亿!) ✔️ 全网唯一671B参数满血版大模型 ✔️ 支持实时联网搜索+私有知识库对接 💡 真实用户锐评: ""某些平台充完值连客服都失踪?在DeepSeek R1这根本不存在!响应速度直接拉满,功能多到像开外挂,关键还稳得一批!""——某不愿透露姓名的全栈工程师 ",2025-02-21T07:12:57Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/694 693,2868146364,请问更小参数量的deepseek-v3模型是否有计划开放获取呢?,代码中模型参数有16B、236B和671B版本的配置,但目前似乎只开放了671B的权重获取,请问其他两个版本是否有计划开源呢?,2025-02-21T06:40:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/693 689,2865352999,非推理模型的蒸馏,请问您那边是否有试过非推理模型的蒸馏,效果如何?,2025-02-20T07:50:04Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/689 688,2864784328,问题返回的content是个类似富文本的格式,请问前端解析用什么现有的工具可以把格式完全显示出来," ",2025-02-20T01:26:06Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/688 687,2863706725,Turkish Translation: README & README_WEIGHTS,"### **Description:** This PR adds Turkish translations for the following documentation files: - → - → The translations maintain the original document structure, ensuring both technical accuracy and Markdown compatibility. To improve organization, I can move the translations to a folder if preferred. Additionally, we can list available translations in the main for better accessibility. Let me know if you’d like any modifications! 🚀 GitHub Repository: DeepSeek-V3-Turkish 📩 **Contact:** [can.deliktas Best regards, **Can Deliktaş** ",2025-02-19T15:34:29Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/687 686,2863480789,[BUG] torch version unavailable (Fedora 41),"**Describe the bug** Torch version on requirements is not available on Fedora 41 **To Reproduce** Install Fedora 41 Update to latest packages Follow install **Expected behavior** install requirements even on newer versions **Screenshots** NA **Additional context** NA ",2025-02-19T14:11:43Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/686 685,2862792912,[BUG],"**Describe the bug** We deployed the DeepseekV3 model on an 8-card 3090 server, and we reported the following error from the command line startup **To Reproduce** Our cuda is 12.1, pytorch is 2.4.1, and python is 3.10 **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** **Additional context** Add any other context about the problem here. ",2025-02-19T09:47:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/685 680,2861338347,Update model.py,hggchfcchchchchgchc,2025-02-18T18:52:32Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/680 679,2859212848,[BUG],"**Describe the bug** 场景:使用官网V3对话,未选深度思考和联网搜索,输入对话:”今年一共多少周,本周是第几周?“,回复是:”2023年共有52周。今天是2023年10月12日,本周是第41周(从2023年10月9日到2023年10月15日)。“,但是实际我是2025年2月18日10点多进行的对话,明显回复的是错误的。 **To Reproduce** Steps to reproduce the behavior. **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** **Additional context** Add any other context about the problem here. ",2025-02-18T03:02:38Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/679 677,2857906954,请问deepseekV3使用的什么训练框架,"**Is your feature request related to a problem? Please describe.** 我在DEEPSEEK_LLM的论文中看到你们使用HAI_LLM框架作为你们的模型训练框架,并且用到了3D并行,我想请问你下V3和R1仍然使用的是该框架吗,并且3D并行是如何实现的,能否分享一下 **Describe the solution you'd like** A clear and concise description of what you want to happen. **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request here. ",2025-02-17T13:42:21Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/677 676,2857645037,Code Audit on DeepSeek V3,"Hi all! I have created a code auditing tool using AI and I let DeepSeek-V3 run through it. Hopefully its helpful: It has definitly one of the better outcomes & one orange flag: CodeDD is still WIP (developed by just me), so happy and keen to recieve your feedback! Camillo PS: Not sure if this is the right spot to post this ",2025-02-17T12:00:02Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/676 675,2857228342,deepseek amis amis-api,推荐一款很好用的 ,2025-02-17T09:10:45Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/675 673,2856938307,推理示例中的3份config文件分别代表什么意思?,inference 目录下有3份配置文件,分别代表什么意思?,2025-02-17T06:53:31Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/673 672,2856856622,为什么FP8的累加器需要34bits,具体有什么理论依据呢?, ,2025-02-17T06:03:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/672 671,2856251959,Update LICENSE-CODE, ,2025-02-16T18:14:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/671 670,2856104414,[疑问] 关于DeepseekV3Model类默认关闭gradient checkpointing.的原因,"感谢DeepSeek团队优秀的工作! 我在阅读HuggingFace上DeepSeek-V3模型的代码时,发现在 中DeepseekV3PreTrainedModel类中声明了支持 ,但是在DeepseekV3Model类中似乎默认关闭了gradient checkpointing. (1372行)代码的切片如下: 我想请教下开发者为什么要这么设置呢?期待开发者和社区伙伴的答疑。",2025-02-16T13:31:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/670 669,2855881617,[BUG] Unexpected Chinese-Language Response in English Conversation (Cached Data Crosstalk),"During a conversation about regex whitelisting for #BlockIt on macOS, the AI returned a Chinese-language technical tutorial about alignment that was unrelated to the query. This appears to be a cached data cross-talk error. 1. **User Query**: can you make a regex to whitelist a subreddit 2. **Follow-up**: Clarifications about #BlockIt macOS usage 3. **Observed Result**: Response included Chinese technical content about image alignment (no user-provided Chinese input) **Expected Behavior**: Responses should remain focused on the user's query without injecting unrelated cached data. **Actual Behavior**: Irrelevant Chinese-language OpenCV tutorial content appeared mid-conversation. **Technical Details**: - Occurrence Time: 2025-02-16T04 22Z (system time) - User Locale: en-AU - Affected Service: Likely response generation pipeline - Error Type: Probable cached data mix-up between user sessions **Additional Context**: - No Chinese input provided by user - Error persisted across multiple follow-up queries - User Agent: Raycast AI - Full conversation thread available upon request ",2025-02-16T04:54:33Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/669 665,2855067273,是否可以采取 map/reduce 的策略,在集群中节点本地进行计算,降低数据搬运的开销, ,2025-02-15T02:58:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/665 664,2854453688,DeepSeek有考虑过在逻辑语(Lojban)上训练大模型吗,这种人工语言非常的形式化,且规则明确, ,2025-02-14T18:25:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/664 661,2852937129,How convert bfloat16 to fp8 model?,"I use code below in kernel.py to convert bfloat16 to fp8. But I cannot use vllm load converted fp8 model. def act_quant_kernel(x_ptr, y_ptr, s_ptr, BLOCK_SIZE: tl.constexpr): pid = tl.program_id(axis=0) offs = pid * BLOCK_SIZE + tl.arange(0, BLOCK_SIZE) x = tl.load(x_ptr + offs).to(tl.bfloat16) s = tl.max(tl.abs(x)) 448. y = x s y = y.to(y_ptr.dtype.element_ty) tl.store(y_ptr + offs, y) tl.store(s_ptr + pid, s) def act_quant(x: torch.Tensor, block_size: int = 128) -> Tuple[torch.Tensor, torch.Tensor]: assert x.is_contiguous() assert x.size(-1) % block_size == 0 y = torch.empty_like(x, dtype=torch.float8_e4m3fn) s = x.new_empty(*x.size()[:-1], x.size(-1) block_size, dtype=torch.bfloat16) grid = lambda meta: (triton.cdiv(x.numel(), meta['BLOCK_SIZE']), ) act_quant_kernelgrid return y, s ",2025-02-14T07:56:16Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/661 660,2852538459,源码中world size是什么含义,rt,想问一下 源码中的world size是什么含义,源码中 整除world size的意义是啥,2025-02-14T02:46:30Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/660 659,2852466775,[Question] about suggestions on hardware design in DeepSeek v3 paper,"I am a hardware designers and really appreciate that the DeepSeek v3 paper included suggestions on hardware design! I have a few questions: 1. Could you please share the methodology of the experiment where you measured Hopper Tensor Core FP8 GEMM accuracy? Paper states that only the highest 14bits are calculated in tensor core when doing FP8 GEMM. 2. Do you have any suggestions on inference focused hardware? For example the FP8 accumulation precision needed and generally any other suggestions? Thank you!",2025-02-14T01:47:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/659 658,2852424699,关于deepseek中Multi Head Latent Attention 中的一些问题,"1、为什么Query在升维后 分出需要rope位置编码部分和不需要rope编码部分,而Key是在降维后得到rope这部分? 2、为什么Value向量是kv升维后分出来的,Key和Value为什么不是独立的? ",2025-02-14T01:10:16Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/658 657,2851611518,[Edited] Fix minor bug in the main function,"Changes made in branch: **MayureshMore:main** [Edited] Fix minor bug in the main function ",2025-02-13T16:58:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/657 656,2851525175,del other, ,2025-02-13T16:21:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/656 654,2850396123,demo project," I'm a newbie and want to learn how to generate my own model based on DeepSeek's knowledge distillation technology. import torch import torch.nn as nn import torch.optim as optim import torchvision import torchvision.transforms as transforms # 设置设备 device = torch.device(""cuda"" if torch.cuda.is_available() else ""cpu"") # 超参数 epochs = 20 batch_size = 256 temperature = 4 # 温度参数 alpha = 0.7 # 软标签损失权重 # 数据加载 transform = transforms.Compose([ transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]) train_set = train=True, download=True, transform=transform) train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True) # 定义老师模型(ResNet-18) teacher = torchvision.models.resnet18(pretrained=True) teacher.fc = nn.Linear(teacher.fc.in_features, 10) # CIFAR-10有10类 teacher = teacher.to(device) # 定义学生模型(MobileNetV2) student = torchvision.models.mobilenet_v2(pretrained=True) student.classifier[1] = nn.Linear(student.last_channel, 10) student = student.to(device) # 训练老师模型(此处假设老师已预训练好,直接加载) # 实际中需要先训练老师模型,此处为简化跳过 # 定义损失函数和优化器 criterion_hard = nn.CrossEntropyLoss() # 硬标签损失 criterion_soft = nn.KLDivLoss(reduction='batchmean') # 软标签损失 optimizer = optim.Adam(student.parameters(), lr=0.001) # 蒸馏训练循环 for epoch in range(epochs): teacher.eval() # 固定老师模型 student.train() # 训练学生模型 running_loss = 0.0 for inputs, labels in train_loader: inputs, labels = inputs.to(device), labels.to(device) # 前向传播 with torch.no_grad(): teacher_logits = teacher(inputs) student_logits = student(inputs) # 计算损失 # 软标签损失(使用温度参数软化) soft_loss = criterion_soft( nn.functional.log_softmax(student_logits temperature, dim=1), nn.functional.softmax(teacher_logits temperature, dim=1) ) * (alpha * temperature * temperature) # 缩放损失 # 硬标签损失 hard_loss = criterion_hard(student_logits, labels) * (1 - alpha) total_loss = soft_loss + hard_loss # 反向传播 optimizer.zero_grad() total_loss.backward() optimizer.step() running_loss += total_loss.item() print(f'Epoch {epoch+1}, Loss: print(""Distillation finished!"")",2025-02-13T08:58:19Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/654 653,2850216671,[Question] On IBGDA implementation.,"> Additionally, we leverage the IBGDA (NVIDIA, 2022) technology to further minimize latency and enhance communication efficiency. May I know that are you using NVSHMEM or libgdsync or any other? Thanks.",2025-02-13T07:36:50Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/653 652,2849944750,[BUG]DEEPSEEK无法回答正常问题," ",2025-02-13T04:48:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/652 650,2849360545,[BUG] DeepSeek - Outdated Vulnerable Software - High (8.8),"# Risk: High (8.8) **NOTE: ** The score was derived from the highest vulnerability in the software dependency package. --- # Description: While conducting a security review of , we observed the noted web application used outdated 3P software with known vulnerabilities. The risks associated with using such dependencies are significant and include security vulnerabilities, data breaches, malware infections, compliance violations, and reputation damage. --- # Impact: These vulnerabilities can be exploited by attackers to compromise the system, steal sensitive data, infect it with malware, violate industry regulations, and harm the organization's reputation. Outdated software is often no longer supported by its developers, which means that any security vulnerabilities that are discovered are unlikely to be patched. This makes it easier for hackers to exploit these vulnerabilities and gain access to our system or steal our data. --- # Affected Assets: ## Affected Software: 1. certifi 2. jinja2 3. tqdm 4. transformers 5. urllib3 --- # Evidence: ## depscan: ## snyk: ## trivy: --- # Replicate Finding: 6. Download & Install OWASP Dependency Check: 7. Navigate to the impacted repo (locally) & run the following command: 8. Open either the CSV file or HTML file for results --- # We recommend updating the affected software to the latest supported version. Additionally, we recommend hardening the system after patching and ensuring all installed software including the noted software are patched. For more details, please the references below! **Be advised, the above patch should be applied to all other system(s) running the impacted software that are managed by the team.** --- # References: 1. 2. 3. ",2025-02-12T21:10:36Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/650 649,2849355289,[BUG] Insecure Data Processing - Timing Attack Against Secret - High (7.4),"# Risk: High (7.5) **Score Explanation:** The vulnerability has a high confidentiality impact (C:H) because a successful timing attack can leak sensitive information. The issue is exploitable remotely (AV:N), though it requires significant effort and precision (AC:H). No privileges are required (PR:N), and the attack does not require user interaction (UI:N). Integrity is unaffected (I:N), but availability can be minimally impacted due to computational overhead (A:L). --- # Executive Summary DeepSeek-V3, a Mixture-of-Experts (MoE) LLM, is vulnerable to a **timing attack against secret verification** due to its use of a **non-constant-time** comparison routine in token processing. This flaw could allow an attacker to infer secret values, such as authentication tokens or cryptographic keys, by measuring response times. While DeepSeek-V3 delivers state-of-the-art performance, this issue poses a serious risk to applications that rely on it for secure processing of sensitive inputs. --- ## Detail Finding: While performing a security code review of , we observed the noted code repo is susceptible to a Timing Attack Against Secret. Specifically, the vulnerability arises from how **prompt_mask** is computed and used in **DeepSeek-V3's inference pipeline**. Particularly, the comparison logic at lines **59 and 68 in ** introduces **timing discrepancies** based on token values. 1. **Line 59** ( ) checks whether a token exists but does not use a **constant-time approach**, leading to processing time variations based on token content. 2. **Line 68** ( ) introduces further timing variations when checking the **end-of-sequence (EOS) token**. The timing attack vulnerability we’ve identified in DeepSeek-V3 aligns with the following **OWASP Top 10 for Large Language Model (LLM) Applications 2025** categories: 1. **LLM02:2025 Sensitive Information Disclosure**: This category addresses scenarios where LLMs inadvertently expose confidential data, including personal identifiable information (PII), financial details, or proprietary business information. In the context of DeepSeek-V3, the timing attack could allow attackers to infer sensitive information by analyzing response times, leading to unauthorized data access and privacy violations.  2. **LLM08:2025 Vector and Embedding Weaknesses**: This risk pertains to vulnerabilities in systems utilizing vectors and embeddings, especially in Retrieval Augmented Generation (RAG) setups. Weaknesses in how vectors and embeddings are generated, stored, or retrieved can be exploited to inject harmful content, manipulate model outputs, or access sensitive information. In DeepSeek-V3, the non-constant-time verification routine could be exploited to manipulate embeddings, leading to potential data leakage or unauthorized access.  --- ## Impact: An attacker could craft special inputs, measure response delays, and infer **private data** by exploiting these timing inconsistencies. Additionally, an attacker could perform the following attacks: - **Sensitive token leakage:** Attackers could extract secret keys, authentication tokens, or model-internal data through statistical analysis of response times. - **Potential model poisoning:** If DeepSeek is used in a **multi-tenant environment**, adversaries could deduce how different input sequences affect the model’s state. - **Increased risk in security-critical deployments:** AI-driven **access control mechanisms, chatbots handling confidential queries, or secure computations** are at risk. While DeepSeek-V3 offers **efficient inference and high performance**, this timing flaw could undermine its security, especially in sensitive use cases. --- # Affected Assets: ## Affected File(s): - - --- # Evidence: ## ## --- # Replicate Finding: 3. Clone the impacted repo: 4. Navigate into the noted repo 5. Open the impacted file(s) 6. Got to impacted line --- # We recommend implementing **constant-time operations** for secret-dependent comparisons. Instead of directly comparing tokens, leverage **PyTorch’s optimized cryptographic-safe operations**. Additionally, we recommend the following hardening steps: 1. **Use padding techniques** to mask timing variances. 2. **Normalize execution times** to reduce distinguishability. 3. **Conduct differential analysis** to detect timing discrepancies in responses. ## **Fix (Using Constant-Time Masking)** Modify ** ** to ensure **constant-time computation**: For more information & context, please see the reference section below. **Please ensure that the above patch is applied to all affected software, services, applications, instances or systems managed by the team.** --- # References: 1. Constant-Time Comparison in PyTorch: [https 2. Preventing Timing Attacks in Deep Learning: [https 3. 4. ",2025-02-12T21:07:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/649 645,2847370884,如何挣钱,求高人指点,"自己包装api开发的咨询平台用户付费太少,大家都选择大厂包装的免费app。 个人创业者应该如何破局呢",2025-02-12T07:10:26Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/645 644,2846989669,请问Deepseek目前有技术社区交流平台吗?,"**请问目前有技术社区交流平台吗?** 我想咨询关于在 Rocky Linux 8.9(64 位)上使用 Deepseek 的问题,具体有以下几点: 分享 AI 工具的使用心得 记录日常使用习惯,交流最佳实践,帮助更多人高效利用 AI 工具。 评测最新 AI 产品 集中整理最新推出的 AI 工具,提供详细的使用建议和客观评价,帮助用户快速了解其优缺点。 探索模型部署与应用 深入研究如 DeepSeek 等模型的本地化部署、微调方法以及实际应用场景,分享技术经验与解决方案。 技术与思想的碰撞 打造开放社区,促进技术与创意的自由交流,激发更多创新灵感。 期待与大家共同探讨,推动 AI 技术的普及与应用!",2025-02-12T02:04:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/644 643,2846979759,Deepseek在Rocky Linux 8.9 (64位)上面部署Docker服务,"我想咨询关于在 Rocky Linux 8.9(64 位)系统上部署 Deepseek API 的问题,具体有以下几点: 1. 是否可以使用 Docker 单独部署 Deepseek 服务?类似于其他服务(如接入微信机器人的 link-ai)的部署方式。 2. 或者是否有类似 ChatGPTNextWeb 的独立部署方案? 期待您的解答,感谢!",2025-02-12T01:55:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/643 641,2845452444,Add Turkish Translation for README and README_WEIGHTS ," **Add Turkish Translation for README and README_WEIGHTS** **Description:** This PR adds a Turkish translation for the following documentation files: - - The translation ensures both **technical accuracy** and **Markdown compatibility** while maintaining the original document structure. Let me know if any modifications are needed! 🚀 My E-mail:can.deliktas@protonmail.com",2025-02-11T13:49:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/641 638,2844578674,是否在开源模型上支持 工具调用?,是否在开源模型上支持 工具调用?如果支持那又该如何调用呢?,2025-02-11T08:10:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/638 636,2844205735,Create 陈诚,"给我写一篇论文 ",2025-02-11T03:48:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/636 630,2841540080,请问下,部署BF16和FP8的deepseek v3,分别需要多少显存,目前看到的都是第三方博客上的,官方文档可以更新下啊,2025-02-10T07:12:22Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/630 629,2841469683,寻求从零开始训练大语言模型(LLM)的教程或指导,"大家好!我是一个LLM小白,最近想从零开始学习如何训练一个大模型,但发现网上的教程要么太零散,要么门槛太高。 希望有经验的大神能提供一个保姆级的教程,帮助我从0到1完成大模型的训练。 ### 目前我遇到的挑战: 数据准备:如何准备训练所需的高质量数据?需要多少数据量才足够? 模型架构:如何选择合适的模型架构(比如 Transformer、GPT、BERT等)? 训练环境:如何搭建训练环境(GPU、TPU的配置、训练框架如 PyTorch 或 TensorFlow 等)? 优化与调参:如何调整超参数,保证模型收敛? 分布式训练:如何利用多台机器或多GPU进行训练,避免内存限制? 其他问题:如何评估训练效果,如何避免过拟合等问题? ### 希望获得的帮助: 是否有推荐的教程或文档,从头到尾讲解如何训练一个大模型? 如果有相关的 GitHub 项目或开源代码,可以参考的资料也请分享! 任何实用的经验和建议都会非常感谢,尤其是对于初学者来说,如何避免常见的坑。 非常感谢大家的帮助,期待得到你们的宝贵意见! ",2025-02-10T06:29:29Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/629 627,2841245345,Create DeepSeek-V3،, ,2025-02-10T03:34:50Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/627 626,2841243478,咋样能保证相同的输入,多次生成的内容完全一致呢?,"我调整了以下参数 tem=0 top_p=1 seed=123 仍然无法实现相同输入的情况下,多次生成内容的一致性。 哪位大佬有办法呢。。。",2025-02-10T03:32:55Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/626 625,2841124774,deepseek开发平台,请问deepseek有自己的一发平台吗,企业可以通过这个平台构建自己的智能体?,2025-02-10T01:48:05Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/625 624,2841120859,Can a 4060 GPU of laptop version run the DeepSeek model locally?,Thanks for your great contribution in development of LLM. I would like to know if a 4060 GPU of laptop version can run the DeepSeek locally.,2025-02-10T01:44:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/624 623,2841120041,请问:如何用deepseek模型微调或蒸馏自己的数据?还是用传统的lora? 官方好像没有介绍, ,2025-02-10T01:43:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/623 621,2840715901,怎么样能最快从自动驾驶算法工程师转到大模型算法工程师?,我的日常工作会涉及transformer。之前项目也有融合激光雷达和相机数据的,可以拿来学习。我日常工作也涉及一些模型小型化的工作,比如混合精度训练。另外,我也会用openmmlab的分布式训练。我怎么样能最快从自动驾驶算法工程师转到大模型算法,2025-02-09T14:27:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/621 619,2840323186,Numerical Stability in Scaling Factor Computation (s = tl.max(tl.abs(x)) / 448.),"However, it's unclear why 448. was chosen as the divisor. This fixed value might not be optimal for all datasets, potentially leading to numerical instability in cases with extreme outliers or varying distributions. Why was 448. chosen? Was this value derived empirically, or is there a theoretical justification? If this was tuned for a specific dataset or hardware, it would be useful to document the rationale behind 448.. ",2025-02-08T23:19:02Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/619 616,2839772903,chore: update README.md to improve layout, ,2025-02-08T10:30:16Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/616 615,2839737326,"It's not a multimodel, why can it do multimodel understanding","It seems that v3 is not a multimodel, but by experiencing the web application(chat.deepseek.com), it appears that it can upload image and understand the content of the image. I'm curious why can it do so.",2025-02-08T09:27:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/615 614,2839733815,Not Working," When I write my first prompt then it'll work properly but all time it's not work after the first attempt Please verify that why it was happening.",2025-02-08T09:18:40Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/614 613,2839719616,Enhancing DeepSeek 70B Usability in Low-Resource Environments,"#### **Description** DeepSeek 70B is a powerful language model that performs exceptionally well on high-performance hardware (e.g., 8xA100 80GB). However, its deployment is challenging in low-resource environments (e.g., consumer GPUs or CPU-only servers). #### **Problem Statement** Currently, DeepSeek 70B has high VRAM requirements, making local deployment difficult for many small businesses and individual developers. Are there any plans to improve accessibility through the following optimizations? 1. **Multi-GPU Optimization for Low VRAM** - Implement more efficient model parallelization techniques (e.g., ZeRO-Offload, FlashAttention) to reduce memory consumption. 2. **Quantization Support** - Provide 4-bit or 8-bit quantized versions of the model, allowing it to run on consumer GPUs (e.g., RTX 3. **Optimized Inference API** - If local deployment remains costly, is there a plan to offer a more optimized cloud API (similar to OpenAI's API) with reduced cost for inference? #### **Expected Benefits** - Enables more developers to run DeepSeek 70B on local machines or small-scale servers, fostering adoption in small businesses and research communities. - Lowers deployment barriers, making open-source large models more practical and accessible. ",2025-02-08T08:47:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/613 610,2839513084,[BUG],"from dsk.api import DeepSeekApi No module named dsk.api How I can fix it? Tnx.",2025-02-08T04:27:19Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/610 609,2839412785,[BUG] 关于请求协助说服企业重视DeepSeek可访问性问题的呼吁,"您好!我是DeepSeek服务的视障用户。由于视障用户无法直接看到屏幕内容,因此必须使用屏幕阅读器才能使用DeepSeek服务。然而,当我进入DeepSeek页面时,发现其可访问性非常差,视障用户几乎无法正常使用。因此,我通过GitHub多次提交了修正请求。所谓可访问性,是指不仅非残障人士,像我这样的视障用户也能使用的网页或应用程序的开发服务。 以下是我在GitHub上提交的相关问题链接: 1. [BUG] Accessibility Enhancement: Screen Reader Support for Toggle Buttons in DeepSeek Chat #246]( 2. [BUG] Accessibility Improvement for Screen Reader Users in DeepSeek v3 Chat Feature #233]( 3. [BUG] Accessibility Issue with DeepSeek v3 - Impossible reCAPTCHA for Blind Users #220]( 但问题是,尽管我在GitHub上提出了这些问题,服务团队却没有任何回应或行动。这让我怀疑他们是否在无视我的反馈。或许因为残障人士属于少数群体,从企业层面来看,他们可能并不太关心这些问题。因此,在这种情况下,我们需要更积极地推动改变。 如果您有办法,请一起向中国DeepSeek公司留言,呼吁他们不要忽视残障用户的需求。否则,我认为DeepSeek不会重视可访问性,因为这并不会带来直接的经济效益。事实上,我提到的问题并不难解决。我本人也是一名开发者,不仅提出了具体问题,还分享了解决方案、代码以及实施方法,但他们完全无视了这些内容。 当然,在我提出的问题中,关于安全验证码导致屏幕阅读器用户无法登录的问题,他们表示了一些关注。虽然这个问题尚未解决,但其他问题却被彻底忽视了。因此,我们需要更多人参与进来,向DeepSeek施加更大的压力,让他们明白,忽视残障用户就无法创造出真正优质的服务。 恳请大家一起参与,推动这一改变。谢谢!",2025-02-08T02:22:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/609 608,2839376353,目前deepseek-v3大模型还不支持json_object结构化输出吗?请求一直报400,使用deepseek-v2.5模型可以使用, ,2025-02-08T01:14:26Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/608 607,2838875403,docs(readme): improve table formatting and readability,"This PR optimizes table styling in README to: **Enhance visual consistency** - Unified column alignment with proper markdown pipe syntax - Fixed irregular spacing This change is ready for immediate merge as it contains no breaking changes. ",2025-02-07T19:00:42Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/607 606,2838499397,[BUG] API Error Report: Frequent JSONDecodeError with deepseek-reasoner Model,"**Issue Description** I am running a series of AI agents using langgraph. Each agent sends API call to deepseek-reasoner model. Sometimes, it gets stuck at first agent, sometimes second or third. I have also implemented **five retries per agent** but all five retries produce same error. When making API calls to the deepseek-reasoner model, I am consistently receiving a JSONDecodeError: Expecting value error. This occurs even when the API returns an HTTP 200 status code, but the response body appears to be empty or malformed. Here is a snippet of the code I am using: from langchain_openai import ChatOpenAI llm_deepseek = ChatOpenAI( model='deepseek-reasoner', base_url="" api_key="""", temperature=0, model_kwargs={""response_format"": {""type"": ""text""}} ) response = llm_deepseek.invoke(""formatted_prompt"") **Error Details** The full traceback of the error is as follows: JSONDecodeError: Expecting value: line 1 column 1 (char 0) **Steps Taken** To troubleshoot, I have: Verified that my API key is valid and has sufficient quota. Checked the API status page (status.deepseek.com) for any ongoing issues. Implemented retry logic and fallback mechanisms, but the issue persists. ",2025-02-07T15:53:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/606 605,2838187201,[Suggestion] Delete meaningless issues.,"### Just personal opinion. Many meaningless issues in the repo as I mentioned in , I'll just list out some couple of meaningless AD-like issues in this issue for you to delete or close them: 1. #601 answered. 2. #597 advertising. 3. #416 The writer doesn't seem to know what is a pull request. 4. #241 answered. 5. #231 answered. 6. #171 ~~What can I say?~~ Just meaningless. 7. #169 I'm a bit hesitant about this issue. Somehow I think a README is enough. 8. #23 answered. ……",2025-02-07T13:38:25Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/605 604,2837899871,General pre-processing question,"Hello everyone, The model is amazing but I have this question when sharing a pdf document with the model ,and ask a question about the pdf document. How DeepSeek handle the processing of the document? Has this part been shared also? Because tesseract or extracting the document text(fitz,PyPDF2 ,pdfplumber,..) is not always accurate. Is there any available documentation on how the pre-processing should be done as I am using the model locally? Thx",2025-02-07T11:15:03Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/604 603,2837853345,Does the plan support compute use?,"Claude 3.5 sonnet model has supported the feature of compute use, which can greatly improve the efficiency in ai programming plug-ins. deepseek is a great project, which has good code generation ability and hopes to support compute use.",2025-02-07T10:52:21Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/603 602,2837544807,在java使用http请求调用deepseek api 响应异常的慢,有解决办法吗,"在java使用http请求调用deepseek api 响应异常的慢,时好时坏,目前使用okhttpclinet和httpclient请求都无法解决,希望大佬们指教指教 In java, the use of http requests to call deepseek api response is abnormally slow, good or bad, currently using okhttpclinet and httpclient requests can not be solved, I hope the big guys to advise",2025-02-07T08:26:34Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/602 601,2837429482,官方微信群满了,能开个二群吗?,"RT 感谢",2025-02-07T07:22:39Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/601 600,2837309984,Minor grammatical tense corrections to README.md,Minor changes to correct grammatical tense for activities that took place in the past.,2025-02-07T06:02:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/600 599,2837243861,[BUG] Frequent JSONDecodeError with DeepSeek API," This problem frequently occurs when I use the deepseek-API and has been going on for several days. Even the deepseek sample request (shown below) has this error frequently. ",2025-02-07T05:02:44Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/599 597,2836399320,DeepSeek群,"非官方群,只为学习交流 ",2025-02-06T19:30:57Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/597 596,2836320330,[BUG] I can't run DeepSeek V3 using SGlang,"**Describe the bug** When run this code I get 404 - Not found. The api call is hiting the server: **To Reproduce** I run DeepSeek V3 into SGlang using this recipe (docker version): I'm using 4 cluster nodes with 4 Nvidia A100 each. Here is the command: In the other 3 hosts I change only the parameter **Expected behavior** Get the response using the API **Additional context** One strange behavior is that the server was up into 3rd node, not in the master. ",2025-02-06T18:46:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/596 595,2835490582,inference line 61 only use one token to predict a new one? not a sentence?, ,2025-02-06T13:08:36Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/595 594,2835459373,DeepSeek-V3-lite naming conventions?,"Hello, i am currently working on a pruned version of DeepSeek V3, The methodology involves layer wise routed expert pruning and distillation, then post training on the full model. I already tested the pipeline on DeepSeek V2 lite, bringing 64 experts to 16 experts and it seems to give correct results. I just started running the same method on Deepseek V3 with the following pruned target: Base Model: 256 => DeepSeek-V3-671B 22 => DeepSeek-V3-Lite-72B 16 => DeepSeek-V3-Lite-57B 8 => DeepSeek-V3-Lite-36B 4 => DeepSeek-V3-Lite-26B I'll upload them on huggingface when the pipeline finish to run (it should take about 3 days on my 2x3090 rig). Do you authorize me to adopt the naming convention as above for the uploads? If the methodology gives good result, i'll transfer it to the R1 and R1-Zero as well.",2025-02-06T12:55:33Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/594 593,2835201696,chore: add issue template config and fix documentation issues,"### 📖 Summary This pull request includes minor but meaningful improvements to the repository to enhance its maintainability, documentation, and user experience. ### 🛠️ Changes 1. **Add Issue Template Configuration** - This change allows users to either create blank issues or access support via the provided WeChat group link, improving accessibility for non-technical users or those seeking assistance outside of GitHub. 2. **Update Readme File** - Corrected the contact email link in the documentation to ensure it directs users to the correct address. - Adjusted the casing of the BibTeX citation to follow standard academic conventions, improving readability and professionalism in citations. ### 🚀 Impact - The addition of issue templates improves the overall contributor experience by reducing ambiguity and ensuring that issues are reported in a clear and actionable format. - Fixing the contact email link ensures that users can reach out for support without encountering errors. - Standardizing the BibTeX citation enhances the credibility and usability of the repository for academic and research purposes. --- Let me know if there are any questions or further clarifications needed 🙌",2025-02-06T10:58:55Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/593 592,2835085182,"[Paper BUG] Conflict between Figure 3, formula 21 and formula 22","The conflict is that Figure 3 and formula 22 indicate the input of $$TRM_k$$ is T-K token (k:T-k), while formula 22 indicates the input of $$TRM_k$$ is T-k token (1:T-k). Clearly, we can see this from Figure 3 Also, according to formula 21, since the word embedding is shifted by k, we can conclude that the token in i'th position of $$TRM_k$$'s input should be (i+k)'th token in the whole sentence. However, we can see the subscript of formula 22 is 1:T-k ",2025-02-06T10:08:06Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/592 591,2834805802,Error response from API: Expecting value: line 1 column 1 (char 0),"I still can not use the API for responses, even for the shortest context. Please fix that asap. Here is the full error: ",2025-02-06T07:54:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/591 590,2834745020,服务器繁忙的建议,服务器能否增多一些,用户量如此大的情况下,或者用户排队等候,而不是接口一直报错。,2025-02-06T07:19:10Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/590 589,2834583451,是否可以通过语言直接向deepseek发出请求,"**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Describe the solution you'd like** A clear and concise description of what you want to happen. **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request here. ",2025-02-06T05:28:55Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/589 588,2834499431,在调用deepseek的API时,返回的提示信息显示为“由OpenAI研发”,"问题描述 在调用deepseek的API时,返回的提示信息显示为“由OpenAI研发”。这可能是一个错误信息,因为我知道该API是由贵公司提供的,而不是OpenAI。 复现步骤 使用授权的API密钥,通过HTTP请求调用贵公司的API接口。 发送请求后,查看返回的响应信息。 发现返回的提示信息显示为“由OpenAI研发”。 预期行为 调用贵公司的API时,返回的提示信息应明确指出是由贵公司提供的服务,而不是显示“由OpenAI研发”。 截图 附加信息 我确认使用的是贵公司提供的API接口,而非其他第三方服务。 这个提示可能会让用户产生混淆,建议尽快修复,以确保用户能够正确识别服务来源。 ",2025-02-06T04:14:20Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/588 587,2834452171,API什么时候可以开放batch功能呀,"如题,开放与openai或者qwen类似的batch功能。 用户提交之后不必实时响应,对于服务器负载也是一个很好的功能。 ",2025-02-06T03:37:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/587 586,2834399016,api响应时间过长,并且大概率返回空字符串, ,2025-02-06T02:48:05Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/586 585,2833739082,R1 vs V3,What's the difference between DeepSeek R1 and DeepSeek V3?,2025-02-05T19:01:45Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/585 583,2833499965,[BUG] Replying or asking a new question usually says server is busy on web client,"**Describe the bug** Replying or asking a new question usually says server is busy **To Reproduce** ask a new question or reply **Expected behavior** to actually reply back instead of saying the server is busy **Screenshots** If applicable, add screenshots to help explain your problem. **Additional context** ",2025-02-05T17:05:32Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/583 581,2832984298,"[BUG] i upload a file it shows 100% uploading but it doesn't parsing file, even i wait for 5 minutes.","**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior. **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Additional context** Add any other context about the problem here. ",2025-02-05T13:40:45Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/581 579,2832836181,Issue: Incorrect comment in `linear` function,"The comment in the function regarding contains inaccuracies: This comment has two problems: 1. **Incorrect ""quantized"" condition:** indicates that the tensor is **not** quantized. Quantized tensors have an element size of 1, while higher precision formats like float32 and bfloat16 have element sizes greater than 1. 2. **Misleading ""dequantized version"" phrase:** The code does not perform any dequantization when . It directly uses the original tensor. Proposed solution: Change the comment to accurately reflect the code's logic. A more accurate comment would be: This revised comment clarifies that no dequantization is performed and the original higher-precision weights are used directly when . ",2025-02-05T12:38:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/579 578,2832827060,[BUG]Screenshot upload is still failing,"**Describe the bug** For 2 days straight uploading a image fails, it can't even go to pending it fails right away to upload. **To Reproduce** just try to upload an image **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** ",2025-02-05T12:35:16Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/578 577,2832485625,"如果要部署deepseek-v3,需要多少的gpu?",因为deepseek-v3是fb8训练的,那么,671b,是否只需要700G显卡就可以了呢?,2025-02-05T10:08:57Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/577 576,2832483851,DeepSeek技术交流群,"初学者想深入了解,希望结实相同兴趣的同学 ",2025-02-05T10:08:12Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/576 575,2832435039,企微群满了,希望再开一个群。,"企微群满了,希望再开一个群。 ",2025-02-05T09:47:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/575 574,2832039211,Question about SMs partitions?,"Thank you for your insightful work on overlapping compute kernel and communication kernel. In your technical paper, you employ the warp specialization technique and partition 20 SMs into 10 communication channels. Here, I have a question on how to realize SMs partitions with Nvidia GPU. if using NCCL_MAX_NCHANNELS for communication kernels? then how it comes to compute kernels using the rest SMs? I appreciate any insights you can offer on this matter. Thank you for your assistance.",2025-02-05T06:33:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/574 572,2831789333,Web端开发暂停功能,生成时无法暂停 用户需等待较长时间,2025-02-05T03:42:13Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/572 570,2831583984,Optimizing ,"Changes: init_distributed function: Extracted the distributed setup logic into a separate function. sample function: Modified it to use torch.multinomial instead of an exponentiation-based approach for sampling. Argument Validation: Replaced the assert with a more user-friendly validation in main to ensure that at least one of the parameters (input-file or interactive) is provided. Interactive Code Refactoring: The user interaction logic was kept, but the init_distributed function is now called separately at the beginning of main. Refactored init_distributed function: Extracted distributed setup logic into a separate function. Updated sample function: Replaced exponential approach with torch.multinomial for sampling. Improved argument validation: Replaced assert with a more user-friendly validation in main to ensure at least one parameter (input-file or interactive) is provided. Refactored interactive mode logic: Maintained user interaction logic but moved init_distributed call to the beginning of main. ",2025-02-05T00:35:17Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/570 569,2831495250,File upload support when web option is selected along with DeepThinkR1,"It will be great feature if we are able to upload file using the DeepThink(R1) with the web search option provided in the deepthink chat ",2025-02-04T23:27:21Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/569 566,2831063861,Please provide the code for your model architecture.,"**Is your feature request related to a problem? Please describe.** This repo only provides weights. It makes it difficult to confirm claims from the article. **Describe the solution you'd like** A repo where the code to the model architecture is provided. **Describe alternatives you've considered** Clearly state that the model is not open source. **Additional context** None ",2025-02-04T19:02:23Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/566 565,2830967849,updated Model Summary verbiage to be past tense for easier understanding,Title,2025-02-04T18:12:46Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/565 564,2830934952,Update requirements.txt,"The current pip library does not provide version 2.4.1 'touch' and version 3.0.0 'triton', and the 'requirements.txt' file has been updated to a minimum to meet the current pip installation requirements",2025-02-04T17:56:43Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/564 563,2830767351,fix(fp8_cast): Add robust memory management and error handling," - Add try-catch block for memory management operations - Implement graceful error handling for memory allocation failures - Add explicit CUDA memory cleanup - Protect against potential race conditions in file loading This change improves stability when converting large models by: - Preventing crashes from out-of-memory conditions - Ensuring proper cleanup of GPU resources - Adding error reporting for debugging",2025-02-04T16:37:15Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/563 562,2830318875,"[BUG] Encountering so much ""The server is busy. Please try again later."""," I am facing this so much. a little task cant be done but encounter ""The server is busy. Please try again later."" please fix it, users are experiencing a bad situation. users will go away. ",2025-02-04T13:44:58Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/562 560,2829729397,Huawei Ascend Benchmarking Reports,"**Is your feature request related to a problem? Please describe.** I'm looking for a comparison of Huawei Ascend throughput on NPUs versus SGLang on GPUs. **Describe the solution you'd like** Could someone point me towards data on this? Thanks ",2025-02-04T10:25:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/560 559,2829677323,[BUG] DeepSeek Web Unresponsive," ",2025-02-04T10:03:39Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/559 558,2829593744,"[BUG]ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet']","Traceback (most recent call last): File line 3, in model = trust_remote_code=True) File line 559, in from_pretrained return model_class.from_pretrained( File line 3647, in from_pretrained config.quantization_config = AutoHfQuantizer.merge_quantization_configs( File line 173, in merge_quantization_configs quantization_config = AutoQuantizationConfig.from_dict(quantization_config) File line 97, in from_dict raise ValueError( ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet']",2025-02-04T09:28:20Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/558 557,2829325263,[BUG]safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge,"python convert.py --hf-ckpt-path --save-path --n-experts 128 --model-parallel 4 0%| | [00:00 main(args.hf_ckpt_path, args.save_path, args.n_experts, args.model_parallel) File ""convert.py"", line 51, in main with safe_open(file_path, framework=""pt"", device=""cpu"") as f: safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge",2025-02-04T07:13:01Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/557 556,2829152356,Fix Linear Layer Bias Initialization,"## Description Fixed bias initialization in the Linear class by using instead of the undefined . This fix ensures proper bias initialization for all linear layers in the model. ## Changes Made - Modified to use parameter for bias tensor initialization - Ensures consistency with parent and child classes (ColumnParallelLinear and RowParallelLinear) ## Why This Change is Needed The previous implementation tried to access which is only defined in child classes (ColumnParallelLinear), causing potential issues when the Linear class is used directly. Using is the correct approach as it's always available and matches the weight tensor's output dimension. ## Testing Done - Model initialization works correctly with bias enabled - Compatible with both standard and parallel linear layers - No impact on existing functionality ## Checklist - [x] Code follows the project's coding style - [x] Changes are backward compatible - [x] No new dependencies added",2025-02-04T05:10:17Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/556 554,2828171371,[BUG] File submission failed,"**Describe the bug** Upload of documents to chat keeps failing. **To Reproduce** Select upload button, choose a file i.e., PDF or docx within required limits and Open. **Expected behavior** Takes a second or two and displays ""Upload failed"". **Screenshots** ",2025-02-03T17:58:47Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/554 552,2828029726,The server is busy. Please try again later.,"The server is busy. Please try again later. fix this ",2025-02-03T16:48:28Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/552 551,2828013530,required voice mode similar to chatgpt,i m using chatgpt from short of time its will be good if deepseek provide the same voice mode ,2025-02-03T16:40:52Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/551 550,2827736315,Summarize Chat Title with Content,"**Is your feature request related to a problem? Please describe.** Currently, every chat's title is set to the first prompt the user sends. Hence, all the titles don't really explain the entire conversation. **Describe the solution you'd like** It would be great if the title could be a summary of the conversation (to be more general to what the user asked in the first prompt, and consistent with what the model's response was). **Describe alternatives you've considered** They can just be keywords if summarization is hard to achieve. **Additional context** Here, you can see that the title of the chat is the prompts I put in to the model. ",2025-02-03T14:49:35Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/550 549,2827070199,fixed typo and grammer,"FIXED issue #456 fixed typos and grammer",2025-02-03T10:13:30Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/549 548,2826844393,[BUG],"**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior. **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Additional context** Add any other context about the problem here.",2025-02-03T09:02:06Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/548 547,2826733396,[BUG] torchrun subprocess received Signal 8 (SIGFPE),"**Describe the bug** **To Reproduce** ",2025-02-03T08:24:54Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/547 546,2826630590,Improve RTL (Right-to-Left) Text Display,"Dear DeepSeek Development Team, I’d like to report a display issue affecting RTL (right-to-left) languages such as Persian, Arabic, and others on your platform. Currently, texts in these languages do not render correctly due to improper text direction alignment. Issue Details: Problem: RTL texts (e.g., are displayed with incorrect text direction, causing misalignment and readability issues. Affected Element: The CSS class ds-markdown ds-markdown--block lacks proper RTL styling. Proposed Solution: Adding the CSS property direction: rtl; to the class ds-markdown ds-markdown--block will resolve the issue. This simple adjustment ensures RTL texts align correctly from right to left. Code Suggestion: css Copy .ds-markdown.ds-markdown--block { direction: rtl; } Expected Impact: Proper RTL text alignment for languages like Persian and Arabic. Improved readability and user experience for RTL language users. Additional Notes: This fix addresses fundamental RTL rendering but could be extended to handle numerals or punctuation if needed. Tested on [Your and confirmed resolution (optional: include technical details like OS, browser, or device).",2025-02-03T07:27:51Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/546 545,2826541207,DeepSeek Temporary Service Unavailability Error,"The DeepSeek service is temporarily unavailable due to technical issues. The error message indicates that the platform cannot access real-time information or the web at this time. To Reproduce Open DeepSeek. Attempt to search for real-time information. Observe the error message that indicates the service is unavailable. I am curious if anyone has used the 'Search' functionality within DeepSeek",2025-02-03T06:33:33Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/545 544,2826442096,[BUG] Unable to attach files,"Unable to attach files since Feb 2 afternoon. ",2025-02-03T05:19:49Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/544 543,2826297420,关于图像识别的建议,我希望图像识别不但能够识别有文字的图片,也能够识别那些无文字的图片。,2025-02-03T03:26:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/543 542,2826213916,[BUG]Load model,"When i try to load deepseek model: Attemp 1: ERROR 1: ValidationError: 1 validation error for VLLM Attemp 2: ERROR 2: Why it happends and how to solve",2025-02-03T02:05:56Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/542 540,2825935147,[FIXED] [BUG] API Platform under maintenance,"**Error Description** The DeepSeek API platform has been under maintenance for a long time. **Steps to Reproduce** 1. Try to access the DeepSeek API platform. **Expected Behavior** I expected to be able to access the platform or at least get clear information about the time the maintenance was completed. **Screenshots** **Additional Context** Too long a wait, suspected that the site is blocked in some ",2025-02-02T18:09:52Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/540 539,2825903196,GPU Inferencing: CUDA vs PTX,"For GPU inferencing, do you (Deepseek AI) use CUDA or PTX for your commercial service? Also, in general, for open source GPU inferencing software, do you advise using one or the other? What are the expected gains?",2025-02-02T17:03:14Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/539 538,2825894970,[BUG] File upload issue," ### NOT ABLE TO UPLOAD FILES **Whenever I upload a file it fails to upload ** **please fix this issue ** ",2025-02-02T16:45:15Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/538 536,2825890165,[BUG] can't upload documents,"**Describe the bug** I can't upload documents, it's stuck at 100% **To Reproduce** Steps to reproduce the behavior. **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Additional context** Add any other context about the problem here. ",2025-02-02T16:34:54Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/536 535,2825886842,[BUG] can't upload images,"**Describe the bug** uploading an image returns ""upload failed"" error **To Reproduce** 1. try to upload image **Expected behavior** normal image upload **Screenshots** **Additional context** uploading with vpn also doesn't work ",2025-02-02T16:27:57Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/535 534,2825867559,Server issue : I have been trying to access Deepseek from 3 days now,"I have been trying to access from three days now , but there is always i am gettiing a server error response. Whats happening guys ? did u guys are facing malicious attacks on your servers, or is it something else? ",2025-02-02T15:46:00Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/534 532,2825533249,[BUG] Not able to download model through HuggingFace,"**Describe the bug** Getting the following error: **To Reproduce** Following the code example on **Additional context** Refer to this PR: ",2025-02-02T01:05:24Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/issues/532 530,2825474431,"Saw the file to spy on it, hehehehe", ,2025-02-01T22:28:14Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/530 529,2825381326,eighty-column width on license,"Closes Suitable for reading on 80-column dumb terminals. A newline has been added on the last line.",2025-02-01T19:18:07Z,DeepSeek-V3,https://github.com/deepseek-ai/DeepSeek-V3/pull/529 526,2825319616,[BUG] XSS Vulnerability in DeepSeek AI,"A Cross-Site Scripting (XSS) vulnerability has been identified in DeepSeek AI, which allows an attacker to inject and execute arbitrary JavaScript code. This vulnerability could be exploited to compromise user sessions, steal sensitive information, or conduct phishing attacks. ### Steps to Reproduce: 1. Inject the following payload into an input field that reflects output without proper sanitization: