IdIssue,NumeroComentario,Comentario,DataComentario,AutorComentario,Tags 2854421980,2659967076,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2025-02-14T18:07:54Z,facebook-github-bot, 2854421980,2659967698,> ,2025-02-14T18:08:12Z,kasandell, 2810773400,2620481418,Currently llama3 is not working on windows due to this No module 'termios'` issue.,2025-01-29T02:12:37Z,aintel-vs, 2810773400,2699647848,"### Workaround Solutions for Windows Error **Root Cause** The occurs because is a Unix-only Python module (source). This breaks Windows compatibility in the current implementation. ### Verified Workarounds #### 1️⃣ Use Windows Subsystem for Linux (WSL) Install WSL and Ubuntu *Reference: source* --- ### Official Tracking - Progress on removing Unix dependencies: source - Related Hugging Face discussion: source **Important Note**: will fail on Windows - this module is intentionally unavailable in Windows environments. **Tested Environment** - Platform: Windows 11 - Python: 3.12.1 - Hardware: NVIDIA RTX 3060 (similar results expected for any DirectX 12 GPU) Let me know if you need help implementing any of these solutions! Also if you find this solved your issue mark this issue as closed.",2025-03-05T03:01:11Z,vatsalparikh07, 2802444663,2605361369,Salom,2025-01-21T17:37:58Z,Azizbek896, 2769043853,2571442804,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2025-01-04T23:51:53Z,facebook-github-bot, 2757260403,2645962708,Addresses #291,2025-02-08T22:12:57Z,vekoada, 2757260403,2645973845,"Thanks This addresses a real issue ( ) when running (Addresses #291). Turns out had , but was missing it. The command shows that this parameter is part of the intended configuration, and it's a straightforward fix. I'd say this is good to merge. It would be awesome if you could add a quick comment in the code just to explain what does.",2025-02-08T22:57:23Z,vekoada, 2757260403,2646044470,"This gets llama3.1 running, but it doesn't use the scaled rope",2025-02-09T03:08:03Z,galeselee, 2757260403,2646569959,"As far as I can tell - the actual logic for scaled_rope has not been added yet, causing the code to break when trying to run the model (as said). This was causing a lot of headaches (as can be seen online), so I added this quick fix to make the model usable without any parameters. The other two options seem to be: a) remove the scaled rope parameters from model.py and params.json entirely b) implement scaled rope logic This fix keeps those scaled_rope parameters there for anyone who wishes to implement or utilize scaled_rope [should it be added], while allowing other users to run the model without having to debug and make the changes on their own every time they clone the repo. - I can add a comment summarizing this for the time being. I did make an attempt at implementing scaled_rope, but I have more testing to do as this is not my niche. ",2025-02-09T20:21:55Z,jeremylaratro, 2753650164,2557899245,Hi,2024-12-21T00:05:17Z,melekSaadali, 2753650164,2603588434,Hi,2025-01-21T03:33:05Z,nwatab, 2751373439,2568739871,"As far as i have seen, nothing as such is mentioned. Since you have got your parents consent you are good to go",2025-01-03T06:16:08Z,D-Yuva, 2751373439,2608970801,"that was ststed in license agreement:"" **Licensee**” or “**you**” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf."" but i will use llama to chat like in huggingchat or spaces in huggingface and i don't know if i am a licensee or no also, if mayby my parents can enter on behalf me So what should I do ?",2025-01-23T06:33:02Z,youssef22112, 2740202087,2557151504,I also need batch inference. Any update on this?,2024-12-20T14:47:21Z,Pedrexus, 2740202087,2601007601,"I need it too but it seems that it is not recommended for the moment, and still a WIP from my understanding .. Some similar issues : - Hugging face - Hugging face ",2025-01-19T20:20:05Z,pjmalandrino, 2735684793,2539119504,I was able to resolve this issue by using a different region in the download request and using a VPN on my machine.,2024-12-12T14:31:49Z,alfiinyang, 2716536815,2525553782," what was the input text ?? ",2024-12-08T09:18:02Z,adi-ydv-1, 2716536815,2525675367,"if it comes from my side. -...a long wayover asshole miner provider..they Blocks out German..any year,.. and fucking silent. 3 parts i have located and saved cold ..and since april they want repeating passport controll. i hole her any°s how have see lov37ess(ne). ..mindblowing stuff..i want not realy talk about. iam a strong budhhist crowed allone and i think ..helping People give mind deepness. ..oh and my english is perfekt..i hobe that go clear to the d3vsWorld ",2024-12-08T11:37:29Z,lov37ess, 2716536815,2536646117, both of you together opened this issue ??,2024-12-11T17:31:45Z,adi-ydv-1, 2716536815,2543084536,"> > both of you together opened this issue ?? > > Now the exception was handled like this > > > > Input : If you don't have any memory, how did you agree to my [task]? What a chatbot means is that it cannot interact with you like a human, but every time, you have to remind the bot about what you are asking.",2024-12-14T12:22:52Z,adi-ydv-1, 2698285754,2600997059,"Hi, From my understanding, these parameters could allow to improve model context window size. They adjust rotation angle for frenquencies components. I found usage of these here model.py",2025-01-19T19:47:12Z,pjmalandrino, 2638107515,2466113370,You cannot upload a image to the llama..,2024-11-09T08:13:24Z,adi-ydv-1, 2638107515,2484788833,I have the same error. need help,2024-11-19T06:13:03Z,sleepingXd, 2638107515,2484955493,You cannot directly upload the image..provide a link to the source file .. ,2024-11-19T07:54:14Z,adi-ydv-1, 2638107515,2484973249,"> You cannot directly upload the image..provide a link to the source file .. I am not use image, only say ""hello"". get the same error. ",2024-11-19T08:03:59Z,sleepingXd, 2638107515,2485036583,"> > You cannot directly upload the image..provide a link to the source file .. > > I am not use image, only say ""hello"". get the same error. Have you tried reinstalling the llama ..and which model you are using ?",2024-11-19T08:36:45Z,adi-ydv-1, 2638107515,2485077392,"> > > You cannot directly upload the image..provide a link to the source file .. > > > > > > I am not use image, only say ""hello"". get the same error. > > Have you tried reinstalling the llama ..and which model you are using ? llama-guard-3-11b-vision",2024-11-19T08:55:50Z,sleepingXd, 2638107515,2525077921," I hope now your issue has been resolved llama got an update...??? ",2024-12-07T11:25:02Z,adi-ydv-1, 2623080685,2461017135,I am having the same issues and requested new URLs and tried to download models immediately. I used pip to install the llama-stack . While successfully installed there is not cli present. I am on an Intel chip with the latest Sequoia OS and in a Python 3.9 venv locally. I attempted to use the llama2's download.sh and that's when I hit the 403 forbidden error. Please help as I am stuck without access to the llama3 models.,2024-11-06T23:40:18Z,mhscentral, 2623080685,2516433742,"I have the same problem, have any of you found a solution yet? Please help. Resolving llama3-1.llamameta.net (llama3-1.llamameta.net)... 65.9.95.7, 65.9.95.37, 65.9.95.11, ... Connecting to llama3-1.llamameta.net (llama3-1.llamameta.net)|65.9.95.7|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2024-12-04 07 39 ERROR 403: Forbidden. ",2024-12-04T07:46:44Z,PodWooD, 2610314621,2434178568,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-10-24T03:35:27Z,facebook-github-bot, 2604319037,2428324643," you need to use as the model-id",2024-10-22T05:58:16Z,ashwinb, 2604319037,2429845008,Same effect with that new name too ,2024-10-22T17:21:49Z,Travis-Barton, 2604319037,2430199483,"Resolved! should be Thanks again @ashwinb!!",2024-10-22T20:30:00Z,Travis-Barton, 2600902512,2426369728,"Hello I'd recommend you follow these download instructions, or these download instructions to download from Hugging Face.",2024-10-21T11:11:48Z,pcuenca, 2584602464,2654741718,?,2025-02-12T20:11:15Z,swiftclouddbs, 2584602464,2655435413,"> ? Sorry for the confusion; I had configured the training incorrectly and had forgotten to close the issue",2025-02-13T04:14:23Z,Chahnwoo, 2566767196,2397253396,Is anyone else facing this issue?,2024-10-07T15:29:53Z,vedanshthakkar, 2566767196,2399623850,same problem here! It's missing the important files like pytorch_model.bin What to do now?,2024-10-08T11:48:01Z,salmanjabbarkhan, 2566767196,2399696562,"Hello It looks like you downloaded the original Llama 3.2 checkpoints, which are suitable for use in codebases such as llama-stack or llama-stack. If you want to use the transformers APIs, you need to use the checkpoints in transformers format. Note that you don't have to download them first, the following will automatically download and cache Llama 3.2 1B for subsequent use: If you do want to download them locally, I'd recommend you use the Hugging Face Hub CLI tool like this: The command above will download the transformers checkpoint to a local directory called . ",2024-10-08T12:22:39Z,pcuenca, 2566767196,2400754764," Downloading from Hugging Face Hub CLI worked! Thanks.",2024-10-08T20:24:28Z,vedanshthakkar, 2561264504,2394038225,"You're probably in Europe like me Can't download it due to Meta license and EU laws.",2024-10-04T16:11:34Z,aviscido, 2561264504,2394784243, What about vpn? Would it help?,2024-10-04T23:25:10Z,NazaRik555, 2561264504,2395004645,I've tried with Mullvad and Nordvpn to no avail :(,2024-10-05T10:04:52Z,aviscido, 2561264504,2395005648,"Next step: I'll see if I can pull it from a VM in US, copy the models manually and then transfer them: although it will cost me some money for the VM and the data egress.",2024-10-05T10:07:56Z,aviscido, 2561264504,2440817189,"> Next step: I'll see if I can pull it from a VM in US, copy the models manually and then transfer them: although it will cost me some money for the VM and the data egress. Does it work so?",2024-10-28T08:08:18Z,NazaRik555, 2554690596,2395274461," Hi there, is this repo still active and PR acceptable ? I've found other repos like and are not the right place to follow.",2024-10-06T03:15:03Z,kuizhiqing, 2554212015,2380634502,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-09-28T13:07:16Z,facebook-github-bot, 2554212015,2380652981,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-09-28T14:09:36Z,facebook-github-bot, 2547133199,2377172750, ,2024-09-26T14:41:50Z,efsotr, 2540367798,2365447538,Seems it should use vpn in some specific location.,2024-09-22T03:48:45Z,DavidZyy, 2531646603,2365202515,"I have the same issue too, please help ",2024-09-21T14:11:06Z,anhduckkzz, 2531646603,2372810724,I have the same issue too,2024-09-25T03:05:37Z,YAO-EE, 2531646603,2373342216,"> commented Guys I found out way to fix this problem You will need to install 1 more thing: pip install llama-stack",2024-09-25T07:59:02Z,anhduckkzz, 2531646603,2373767547,"> pip install llama-stack Thanks for sharing! I tested it, and it works perfectly. Appreciate the help! ",2024-09-25T11:03:49Z,nabilmohamed99, 2531646603,2568512584,It seems it is not working on Windows because lack of termios library on Windows. :(,2025-01-02T23:22:13Z,axelock, 2531646603,2604684671,"Even after installing ""pip install llama-stack"" - the error remains ""ModuleNotFoundError: No module named 'termios'"". Im using Windows",2025-01-21T13:05:44Z,Virk-TriMerge, 2531646603,2607252800,"> Even after installing ""pip install llama-stack"" - the error remains ""ModuleNotFoundError: No module named 'termios'"". Im using Windows Same error here on Windows after installing llama-stack... Any solution please?",2025-01-22T13:28:43Z,rmunjuluri, 2531646603,2614305541,same issue after using .,2025-01-26T09:58:50Z,mrsajadpp, 2531646603,2614562869,Same issue.,2025-01-26T19:35:44Z,dannychantszfong, 2531646603,2619342257,I have the same issue. Cannot find a workaround,2025-01-28T15:33:26Z,Antonio-John, 2531646603,2626321125,you can downlad llama model from ,2025-01-31T05:14:23Z,mrsajadpp, 2512311215,2336632916,The reason was because of the layer normalization. Sorry!,2024-09-08T10:32:14Z,veritas9872, 2505049544,2332100392,"Which llama 3 model version are you using? (Number of parameters?) What device are you using? I'm using a 8 GB macBook M2 Pro with 512 GB SSD and was able to get an instant response from the chat query as well as the API call.",2024-09-05T15:54:53Z,ADITYA1720, 2505049544,2332313690,"My model is llama3.1 8B and I have done in Mac also, I'm getting same problem. My system is MacBook Pro i7, 32GB RAM and 512GB SSD. Can you provide me your model name and your code, so I can cross check and let you know. On Thu, 5 Sept, 2024, 21:25 Aditya Jadhav, *** wrote: > Which llama 3 model version are you using? (Number of parameters?) > > What device are you using? > I'm using a 8 GB macBook M2 Pro with 512 GB SSD and was able to get an > instant response from the chat query as well as the API call. > > — > Reply to this email directly, view it on GitHub > < > or unsubscribe > < > . > You are receiving this because you authored the thread.Message ID: > *** > --   *DISCLAIMER* Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the Ganpat University.   ",2024-09-05T17:47:54Z,sneh20122001, 2501365140,2363257139,I found the reason my self. resultchers planed batch processing in Llama3.1.,2024-09-20T09:17:22Z,Sion1225, 2493315662,2322609754,"Hi , this should be a typo. DP should be 8, so we have TP * CP * PP * DP=16384, and DP * Batch * Seq.Len. = 16M. Ask internally to submit a fix for next arxiv paper update.",2024-08-30T23:56:16Z,jianyuh, 2493315662,2328216046,"> Hi , this should be a typo. DP should be 8, so we have TP * CP * PP * DP=16384, and DP * Batch * Seq.Len. = 16M. Ask internally to submit a fix for next arxiv paper update. Thanks for your reply, that makes sense.",2024-09-04T08:18:39Z,kisseternity, 2481063778,2304952292,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-08-22T15:14:39Z,facebook-github-bot, 2476418913,2365066829,"Issue 1 you probably downloaded all 8 models. 8b 8b instruct 70b 70b instruct 405b 405b instruct thats 6 models. who knows really what your talking about. lets focus on the 8b model ity has this fiels if yoru download form meta and to run it you would type this: torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path --max_seq_len 128 --max_batch_size 4 however taht then runs on one 24gb gpu. since the 70b models are larger you need more vram. as it will nto fit on one 24gb gpu.maybe 8? or two 80gb A100gpus. in which hcase the mdoel would be sharded over 8 gpus or two gpus. issue 2. the model is just in different format one is huggingface transformer format and the other is some meta format or pytorch or whatever NGPUS=8 PYTHONPATH=$(git rev-parse --show-toplevel) torchrun --nproc_per_node=$NGPUS $CHECKPOINT_DIR --model_parallel_size $NGPUSsomthing.",2024-09-21T08:53:52Z,mylesgoose, 2466873888,2297149140,"LLaMA models, aren’t designed for tasks where you need to fill in gaps in code like CodeLlama can. CodeLlama is tailored for this kind of job—it's great at completing code snippets or filling in missing parts of code. While LLaMA models excel at tasks like generating text and summarizing information, they’re not specifically built for code completion. If your main goal is to handle code fill-ins, you might want to look at CodeLlama or other tools that specialize in that area.",2024-08-19T18:10:39Z,Alokbpandey, 2466873888,2336545258,"+1. Would love to get an answer on this. If this is not supported, will there be a codellama 3.1? Thank you.",2024-09-08T04:45:47Z,morew4rd, 2466873888,2337013961,"LLaMA currently does not support Fill-in-the-Middle (FIM) functionality. As for CodeLlama 3.1, there haven't been any official announcements regarding its release yet. However, given Meta's continued development in this space, it is possible that future models might include enhanced features for code and text generation​. On Sun, 8 Sep, 2024, 10:16 moreward, *** wrote: > +1. Would love to get an answer on this. If this is not supported, will > there be a codellama 3.1? > > Thank you. > > — > Reply to this email directly, view it on GitHub > < > or unsubscribe > < > . > You are receiving this because you commented.Message ID: > *** > ",2024-09-09T02:58:07Z,Alokbpandey, 2465037213,2290373690, pip install --upgrade transformers,2024-08-15T02:18:06Z,TheRoadQaQ, 2464671890,2486873583,"LLaMA 3.1 has been optimized for Tensor Core GPUs as part of its integration with NVIDIA's AI tools. By default, the LLaMA 3.1 models leverage NVIDIA TensorRT-LLM, which is designed to accelerate inference using Tensor Cores. These optimizations include features like in-flight batching, key-value caching, and quantization to lower precision (e.g., INT4 or FP8) for increased performance and efficiency. Additionally, it supports advanced techniques such as rotary position embeddings and scaled multi-GPU inference​ Out of the box, LLaMA 3.1 works efficiently on NVIDIA GPUs with Tensor Core support, making it ideal for applications requiring high-performance inference on supported hardware",2024-11-19T22:23:15Z,srjsunny, 2460787361,2283763114,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-08-12T11:52:06Z,facebook-github-bot, 2460787361,2283800801,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-08-12T12:09:26Z,facebook-github-bot, 2460787361,2339516689,Can you please review and respond.,2024-09-10T03:04:50Z,nehalmr, 2459294839,2307522097,"I run into the same problem trying to download 3.1-70B with new approved URL and the issue confirmed persists with newly generated ones. All environment dependencies follow instructions on Meta site confirmed installed without error. Appreciate any help to resolve. hmark:~$ llama download --source meta --model-id Llama-3-70B UserWarning: Field ""model_id"" has conflict with protected namespace ""model_"". You may be able to resolve this warning by setting . warnings.warn( Please provide the signed URL you received via email (e.g., https Downloading ... Traceback (most recent call last): File line 8, in sys.exit(main()) File line 54, in main parser.run(args) File line 48, in run args.func(args) File line 174, in run_download_cmd _meta_download(model, meta_url) File line 143, in _meta_download asyncio.run(downloader.download()) File line 44, in run return loop.run_until_complete(main) File line 649, in run_until_complete return future.result() File line 260, in download await self.get_file_info(client) File line 249, in get_file_info response.raise_for_status() File line 761, in raise_for_status raise HTTPStatusError(message, request=request, response=self) httpx.HTTPStatusError: Client error '403 Forbidden' for url ' For more information check: Additional note Model ID get from: usage: llama download [-h] --source {meta,huggingface} [--model-id {Llama-2-7b,Llama-2-13b,Llama-2-70b,Llama-2-7b-chat,Llama-2-13b-chat,Llama-2-70b-chat,Llama-3-8B,Llama-3-70B,Llama-3-8B-Instruct,Llama-3-70B-Instruct,Meta-Llama3.1-8B,Meta-Llama3.1-70B,Meta-Llama3.1-405B bf16-mp16,Meta-Llama3.1-8B-Instruct,Meta-Llama3.1-70B-Instruct,Meta-Llama3.1-405B-Instruct bf16-mp16,Llama-Guard-3-8B,Llama-Guard-3-8B:int8-mp1,Prompt-Guard-86M}] ",2024-08-23T17:33:28Z,jollybutterfly, 2459294839,2308496884,"Title: Unable to Download Meta Llama 3.1 Model Using Provided URL Description: I’m having trouble downloading the Llama 3.1 model using the URL provided by Meta. The URL consistently returns a 403 Forbidden error. Here are the steps I’ve taken: 1. Requested a New URL: • I’ve requested a new download URL multiple times to ensure it wasn’t an expiration issue. • Each time, the URL still returns a 403 error when attempting to download via curl and wget. 2. Checked System Setup: • Verified that all dependencies and environmental variables are correctly set up on my system. • Tested download commands using both curl and wget, but both return the same error. 3. Used Direct Browser Access: • Attempted to open the URL directly in the browser, which also resulted in an error message indicating “Missing Key-Pair Id query parameter or cookie value.” Minimal Reproducible Example: Here is the command I tried using in cmd: curl -o llama_model.zip "" I also tried: wget -O llama_model.zip "" Output: Warning: wildcards not supported in HTTP. --2024-08-24 13 47-- Resolving llama3-1.llamameta.net (llama3-1.llamameta.net)... 3.163.80.90, 3.163.80.110, 3.163.80.49, ... Connecting to llama3-1.llamameta.net (llama3-1.llamameta.net)|3.163.80.90|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2024-08-24 13 47 ERROR 403: Forbidden. Runtime Environment: • Model: Llama-3.1 • Using via huggingface?: No • OS: Windows 10 • Tools Used: curl, wget Additional Context: I’ve attempted to download this model multiple times with different URLs provided by Meta, but the issue remains consistent. Any advice or further troubleshooting steps would be greatly appreciated.",2024-08-24T18:57:48Z,AuthorDustin, 2459294839,2461026523," How did you manage to install the llama cli as installing the llama-stack didn't already create this for me? > > hmark:~$ llama download --source meta --model-id Llama-3-70B UserWarning: Field ""model_id"" has conflict with protected namespace ""model_"". > ",2024-11-06T23:48:56Z,mhscentral, 2459294839,2508928074," ls CODE_OF_CONDUCT.md download.sh example_chat_completion.py LICENSE Llama3_Repo.jpeg README.md setup.py CONTRIBUTING.md eval_details.md example_text_completion.py llama MODEL_CARD.md requirements.txt USE_POLICY.md $ Enter the URL from email: Enter the list of models to download without spaces (8B,8B-instruct,70B,70B-instruct), or press Enter for all: llama-3.2-1B Downloading LICENSE and Acceptable Usage Policy --2024-11-30 06 04-- Resolving llama3-2-lightweight.llamameta.net (llama3-2-lightweight.llamameta.net)... 108.158.20.103, 108.158.20.92, 108.158.20.99, ... Connecting to llama3-2-lightweight.llamameta.net (llama3-2-lightweight.llamameta.net)|108.158.20.103|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2024-11-30 06 06 ERROR 403: Forbidden. I tried to restart the network, but it didn't work. Even I tried to use llama download --source meta --model-id Llama-3-1B It only showed my path to my current folder. Does anyone know how to deal with this problem?",2024-11-30T11:22:14Z,Wang-Ying-Yi, 2459294839,2613782572,"Title: Unable to download the llama 3 model on windows. Description: I am trying to download the llama 3 model on my windows machine using the provided link to download it but it is not being operated or downloaded on windows machine. Lakho>llama model list --show-all Traceback (most recent call last): File """", line 198, in _run_module_as_main File """", line 88, in _run_code File line 4, in File line 7, in from llama_stack.distribution.library_client import ( # noqa: F401 File line 32, in from llama_stack.distribution.build import print_pip_install_help File line 24, in from llama_stack.distribution.utils.exec import run_command, run_with_pty File line 10, in import pty File line 12, in import tty File line 5, in from termios import * ModuleNotFoundError: No module named 'termios'` this is what I am encountering any help will be appreciated. ",2025-01-25T05:01:47Z,Ayaz-75, 2459294839,2658942075,"Unable to Download Meta Llama-2-13b-chat Model Using Provided URL. Tried re-requesting URL, using wget as well as curl. But its not working. Getting 403 forbidden error for URL. below is the status after running the command **llama model download --source meta --model-id Llama-2-13b-chat** Downloading checklist.chk 100.0% bytes - 0 00 Failed: 0.0% bytes - - -- Downloading params.json 100.0% bytes - 0 00 Downloading consolidated.00.pth 100.0% GB - 0 00 ",2025-02-14T10:38:32Z,AnjithaKAnjitha, 2459137678,2282188013,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-08-10T15:34:21Z,facebook-github-bot, 2457687452,2277725237, ,2024-08-09T11:19:55Z,haseebrj17, 2457687452,2406920949,"I don't know if we faced the same problem but looks similar. Maybe it is caused by the code I used the same method and find that the pad_token's id is 128001 but the max is 128000. It triggered the assertion failed. Then I use the and the problem solved. ",2024-10-11T08:40:16Z,YiboZhao624, 2456333446,2276362496,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-08-08T17:52:26Z,facebook-github-bot, 2456333446,2276394852,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-08-08T18:12:23Z,facebook-github-bot, 2451868560,2274035875,"There are few things which might have happened here 1.Make sure your repository structure is clear, and your files are situated in proper locations.(try relative paths in import if not so) 2.Ensure you have the necessary dependencies installed. 3.If u are working on VS Code i suggest you to check if proper interpreter is selected(The one with the Virtual env u have created and installed dependency) 4.Refer to the documentation or README file of the LLAMA repository for specific setup instructions. I don't know the exact problem ,these might be some of the fixes. Explain in detail ,for more info..",2024-08-07T18:03:48Z,ChiruChirag, 2451868560,2274445021,"> There are few things which might have happened here 1.Make sure your repository structure is clear, and your files are situated in proper locations.(try relative paths in import if not so) 2.Ensure you have the necessary dependencies installed. 3.If u are working on VS Code i suggest you to check if proper interpreter is selected(The one with the Virtual env u have created and installed dependency) 4.Refer to the documentation or README file of the LLAMA repository for specific setup instructions. > > I don't know the exact problem ,these might be some of the fixes. Explain in detail ,for more info.. Hello bro: When I trying to setting up the visual environment, I used “pip install -r requirements.txt“ words to install the toolkits in pycharm. But here is a error about ""llama"" model in the top of code in text_tokinezer.py: import os from unittest import TestCase from llama.tokenizer import ChatFormat, Tokenizer This ""llama"" in ""from llama.tokenizer import ChatFormat, Tokenizer"" are not recognized. And the error is: Unresolved reference 'llama' Unresolved reference 'ChatFormat' Unresolved reference 'Tokenizer' Before I truing to install llama model in project, but can not find it. And the project was downloaded directly from me on GitHub. Mat share me how to solve sych problem? ",2024-08-07T22:19:14Z,12dc32d, 2441881121,2269559657,"To use LLAMA3 on a smartphone, you can follow these steps and use the following tools: 1. **Web-Based Interface**: - One of the simplest ways to use LLAMA3 on a smartphone is through a web-based interface. If there's a web application that interfaces with LLAMA3, you can access it via a mobile browser. 2. **Mobile Apps**: - Look for mobile apps that integrate with LLAMA3. Some apps might offer API integration with LLAMA3, allowing you to use its capabilities directly on your smartphone. 3. **Develop Your Own Mobile App**: - If you are a developer, you can create a mobile app that utilizes LLAMA3. Here's a high-level overview of the steps: - **Backend API**: Set up a backend server that runs LLAMA3 and exposes its functionalities through an API. - **Mobile App**: Develop a mobile app using frameworks like React Native, Flutter, or native development. The app can make API calls to your backend server to interact with LLAMA3. - **Hosting**: Host your backend server on a cloud platform like AWS, Google Cloud, or Heroku to make it accessible from anywhere. 4. **Use Jupyter Notebooks on Mobile**: - You can use tools like Juno for iOS or other Jupyter notebook apps available for Android to run Python code on your mobile device. This might not be as efficient as using a dedicated app or web interface, but it can work for experimentation and small tasks. 5. **Cloud-Based Solutions**: - Leverage cloud-based platforms that offer APIs for machine learning models. Services like Hugging Face or Google Colab can be used to run LLAMA3 in the cloud and access it from your smartphone. ### Example Tools and Libraries - **Hugging Face API**: Hugging Face provides APIs to interact with various models, including LLAMA3. - **Google Colab**: Hugging Face allows you to run Jupyter notebooks in the cloud, which you can access from a mobile device. - **Streamlit**: If there's a Streamlit app running LLAMA3, you can access it through your mobile browser. These are some of the ways and tools you can use to work with LLAMA3 on a smartphone. ",2024-08-05T17:27:31Z,ChiruChirag, 2441881121,2270424065,"Thank you so much. it is really helpful! I prefer quantized model for edge device uses",2024-08-06T05:37:39Z,Mattral, 2441881121,2270462786,"Now I use is in termux.Using ollama in termux and download some model.So I can use LLM in my Android! ",2024-08-06T06:11:53Z,yhnz1234, 2441881121,2274015025,"> Thank you so much. it is really helpful! I prefer quantized model for edge device uses Great!! LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA) are some of the good techniques for quantization",2024-08-07T17:53:43Z,ChiruChirag, 2437193545,2259139214,"Check this: You need to upgrade transformers. I solved it with transformers==4.43.1",2024-07-30T20:18:44Z,LeonardoArnone, 2437193545,2260108018,"### Try This Or install the latest version of transformers",2024-07-31T09:45:25Z,Antony-M1, 2437193545,2274667972,"> Check this: You need to upgrade transformers. I solved it with transformers==4.43.1 Yes it is working. thank you for the suggestion",2024-08-08T00:59:01Z,Bhavyatashah97, 2437193545,2288069487,"Tried upgrading transformers library. Still throwing error. Any other suggestions? --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 14 15 # Set up the text generation pipeline with the specified configuration ---> 16 pipeline = pipeline( 17 ""text-generation"", 18 model=model_name, in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs) 780 781 # Config is the primordial information item. --> 782 # Instantiate config if needed 783 if isinstance(config, str): 784 config = AutoConfig.from_pretrained( in from_pretrained(cls, pretrained_model_name_or_path, **kwargs) in from_dict(cls, config_dict, **kwargs) 772 config = cls(**config_dict) 773 --> 774 if hasattr(config, ""pruned_heads""): 775 config.pruned_heads = {int(key): value for key, value in config.pruned_heads.items()} 776 in __init__(self, vocab_size, hidden_size, intermediate_size, num_hidden_layers, num_attention_heads, num_key_value_heads, hidden_act, max_position_embeddings, initializer_range, rms_norm_eps, use_cache, pad_token_id, bos_token_id, eos_token_id, pretraining_tp, tie_word_embeddings, rope_theta, rope_scaling, attention_bias, attention_dropout, **kwargs) 158 eos_token_id=2, 159 pretraining_tp=1, --> 160 tie_word_embeddings=False, 161 rope_theta=10000.0, 162 rope_scaling=None, in _rope_scaling_validation(self) 178 179 self.num_key_value_heads = num_key_value_heads --> 180 self.hidden_act = hidden_act 181 self.initializer_range = initializer_range 182 self.rms_norm_eps = rms_norm_eps ValueError: must be a dictionary with with two fields, and , got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}",2024-08-14T07:44:46Z,harshitsoni1997, 2437193545,2288086430,"> Tried upgrading transformers library. Still throwing error. Any other suggestions? > > ValueError Traceback (most recent call last) in 14 15 # Set up the text generation pipeline with the specified configuration ---> 16 pipeline = pipeline( 17 ""text-generation"", 18 model=model_name, > > in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs) 780 781 # Config is the primordial information item. --> 782 # Instantiate config if needed 783 if isinstance(config, str): 784 config = AutoConfig.from_pretrained( > > in from_pretrained(cls, pretrained_model_name_or_path, **kwargs) > > in from_dict(cls, config_dict, **kwargs) 772 config = cls(**config_dict) 773 --> 774 if hasattr(config, ""pruned_heads""): 775 config.pruned_heads = {int(key): value for key, value in config.pruned_heads.items()} 776 > > in **init**(self, vocab_size, hidden_size, intermediate_size, num_hidden_layers, num_attention_heads, num_key_value_heads, hidden_act, max_position_embeddings, initializer_range, rms_norm_eps, use_cache, pad_token_id, bos_token_id, eos_token_id, pretraining_tp, tie_word_embeddings, rope_theta, rope_scaling, attention_bias, attention_dropout, **kwargs) 158 eos_token_id=2, 159 pretraining_tp=1, --> 160 tie_word_embeddings=False, 161 rope_theta=10000.0, 162 rope_scaling=None, > > in _rope_scaling_validation(self) 178 179 self.num_key_value_heads = num_key_value_heads --> 180 self.hidden_act = hidden_act 181 self.initializer_range = initializer_range 182 self.rms_norm_eps = rms_norm_eps > > ValueError: must be a dictionary with with two fields, and , got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'} I faced the same issue. Try to restart the kernel. This worked for me.",2024-08-14T07:55:05Z,Rumeysakeskin, 2437193545,2288423664,"It worked. The new update of vllm==0.5.4 has fixed this bug. - Upgrade transformers to the latest version - Upgrade vllm to the latest version - Restart the kernel",2024-08-14T10:44:32Z,harshitsoni1997, 2437193545,2303257588,"I received this error in Colab. I upgraded transformers and still got the error. Then I noticed I was on a CPU instance. I switched to a GPU instance and it no longer throws the error. ",2024-08-21T23:01:59Z,bigrobinson, 2437193545,2358266278,"> I received this error in Colab. I upgraded transformers and still got the error. Then I noticed I was on a CPU instance. I switched to a GPU instance and it no longer throws the error. > > ` > model_id = > > pipeline = transformers.pipeline( > ""text-generation"", > model=model_id, > model_kwargs={""torch_dtype"": torch.bfloat16}, > device_map=""auto"", > ) > ` how to restart kernel",2024-09-18T11:55:17Z,GuardSkill, 2437193545,2369039541,"In the upper right in Colab in the dropdown next to ""Connect"", you can change runtime type. This will automatically shutdown and start a new kernel. Under ""Runtime"" in the main menu (upper left), you can ""Restart session"" to restart the kernel.",2024-09-23T18:20:56Z,bigrobinson, 2437193545,2489753230,Does anyone know how to make it compatible with transformers==4.42 or earlier? I can’t upgrade transformers due to compatibility issues with another package.,2024-11-20T23:33:28Z,friendshipkim, 2437193545,2500077044, Perhaps try adapting (cf. ,2024-11-26T09:17:37Z,SnzFor16Min, 2436437820,2264572629,Any updates on this?,2024-08-02T05:13:38Z,BakingBrains, 2434947053,2258862549,Hi! Thanks for your question! We used our internal eval implementation to generate those metrics instead of relying on the public lm_evaluation_harness library. Here is a summary of our lm_evaluation_harness and we also published the evaluation result details as datasets in the lm_evaluation_harness Hugging Face collections for you to review.,2024-07-30T17:32:15Z,wukaixingxp, 2434947053,2279950919,"I guess that you used the wrong model for evaluation. According to the Llama3.1-Evals, they use model instead of the backbone. When I run the evaluation with instruct model (8B), the results look fine. | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |----------------------------|-------|------|----- |---|-----:| |leaderboard_gpqa | | | | | | | | | - leaderboard_gpqa_diamond | 1|none | 0|acc_norm|↑ |0.3636|± |0.0343| | - leaderboard_gpqa_extended| 1|none | 0|acc_norm|↑ |0.3223|± |0.0200| | - leaderboard_gpqa_main | 1|none | 0|acc_norm|↑ |0.3214|± |0.0221| Also, you should remove . Then, the results can be close to the original paper (32.8). | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |----------------------------|-------|------|----- |---|-----:| |leaderboard_gpqa | | | | | | | | | - leaderboard_gpqa_diamond | 1|none | 0|acc_norm|↑ |0.3232|± |0.0333| | - leaderboard_gpqa_extended| 1|none | 0|acc_norm|↑ |0.3040|± |0.0197| | - leaderboard_gpqa_main | 1|none | 0|acc_norm|↑ |0.3304|± |0.0222| The below is my command: ",2024-08-10T07:20:42Z,sherlcok314159, 2434947053,2306321119,"I ran your command and this is what i got | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |----------------------------|-------|------|----- |---|-----:| |leaderboard_gpqa | | | | | | | | | - leaderboard_gpqa_diamond | 1|none | 0|acc_norm|↑ |0.3434|± |0.0338| | - leaderboard_gpqa_extended| 1|none | 0|acc_norm|↑ |0.3095|± |0.0198| | - leaderboard_gpqa_main | 1|none | 0|acc_norm|↑ |0.3371|± |0.0224| Thank you! For other cases with few-shot examples, I should also remove the chat-template arguments, correct?",2024-08-23T05:29:41Z,sorobedio, 2434947053,2307398057, We have developed a eval reproduce recipe to run our published 3.1 evals Hugging Face datasets with eval reproduce recipe. Please take a look and hopefully it can be helpful to you. ,2024-08-23T16:14:31Z,wukaixingxp, 2434947053,2307403532,"There are some differences on our eval implementation and OpenLLM leaderboard v2 as stated in previous eval reproduce recipe, to reproduce the leaderboard result, please take a look at this section",2024-08-23T16:18:09Z,wukaixingxp, 2434947053,2309304803,thank you,2024-08-26T04:44:28Z,sorobedio, 2433961390,2255951531,Thank you for reaching out - I will let the main paper authors know for future updates. ,2024-07-29T13:27:50Z,jspisak, 2433869937,2254471198,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-07-28T10:48:04Z,facebook-github-bot, 2433869937,2254475823,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-07-28T11:04:13Z,facebook-github-bot, 2433660598,2254283641,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-07-27T23:43:10Z,facebook-github-bot, 2433660598,2254289089,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-07-28T00:10:22Z,facebook-github-bot, 2432493312,2253021195,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-07-26T15:39:14Z,facebook-github-bot, 2432493312,2253075785,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-07-26T16:12:31Z,facebook-github-bot, 2431333317,2252386076,"same error ",2024-07-26T09:53:53Z,Skylarking, 2431333317,2253532942,"I also faced the same error above. We can fix it by the following method. If we compare the params.json file of Meta-Llama-3-8B and Meta-Llama-3.1-8B , we could find that there is an extra param defined called ""use_scaled_rope"": true. A quick fix is to remove this extra parameter from the file and then it will run successfully. Not sure whether this is a permanent solution though. cat params.json {""dim"": 4096, ""ffn_dim_multiplier"": 1.3, ""multiple_of"": 1024, ""n_heads"": 32, ""n_kv_heads"": 8, ""n_layers"": 32, ""norm_eps"": 1e-05, ""rope_theta"": 500000.0, ""use_scaled_rope"": true, ""vocab_size"": 128256} NCCL_DEBUG=INFO torchrun --nproc_per_node=1 example_text_completion.py --ckpt_dir Meta-Llama-3.1-8B --tokenizer_path > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at _C._set_default_tensor_type(t) Loaded in 16.99 seconds I believe the meaning of life is > to love God with all your heart, mind, soul, and strength and to love your neighbor as yourself. I believe that the only way to have a relationship with God is through Jesus Christ and that God is the only one who can save us from sin. I believe that the Bible is the inspired Word of God and ================================== Simply put, the theory of relativity states that > 1) the laws of physics are the same for all non-accelerating observers, and 2) the speed of light in a vacuum is the same for all observers. The former is known as “the principle of relativity,” while the latter is known as “the constancy of the speed of light.” ================================== A brief message congratulating the team on the launch: Hi everyone, I just > wanted to take a moment to congratulate you on the launch of the new website. I think it looks great, and I am sure it will be a big hit with the rest of the team. It's great to see so much hard work and dedication going into this project. ================================== Translate English to French: sea otter => loutre de mer peppermint => menthe poivrée plush girafe => girafe peluche cheese => > fromage macaroni => macaroni chicken => poulet cookies => biscuits carrot => carotte broccoli => brocoli cauliflower => chou-fleur tomato => tomate zucchini => courgette potato => pomme de terre ================================== ",2024-07-26T21:29:37Z,krishna1803, 2431333317,2255505367,"A simple solution is to add '**use_scaled_rope**' to line 33 of the file Just add: ",2024-07-29T09:58:50Z,zeeshanhayder, 2431333317,2256581058,"> A simple solution is to add '**use_scaled_rope**' to line 33 of the file > > Just add: > > Thank you!! This fixed it for me :)",2024-07-29T18:01:34Z,markcoatsworth, 2431333317,2435495818,I ran into this error with and the solution by worked.,2024-10-24T14:42:32Z,mrakgr, 2431333317,2573409373," Thank you for identifying the root cause of the issue. In the meantime, before an official fix is released, you can use the monkey patch function below to address it. ",2025-01-06T16:00:06Z,nwatab, 2431333317,2604653275," Thanks! I wonder where this official source is ",2025-01-21T12:52:23Z,galeselee, 2431333317,2604768255," If you're referring to the monkey patch, it's not from an official source—I wrote it myself.",2025-01-21T13:40:21Z,nwatab, 2431333317,2645963619,"Pull request #372 opened by on Dec 23, 2024 uses solution. Still pending official review",2025-02-08T22:16:13Z,vekoada, 2429202855,2249668302, can you give try one more time with new URL and provide the request id?,2024-07-25T07:41:39Z,samuelselvan, 2429202855,2249703454,"> can you give try one more time with new URL and provide the request id? Request ID: 1710880222981693 But problems still exist. DESKTOP-021DSF0 MINGW64 (main) $ Enter the URL from email: Enter the list of models to download without spaces (8B,8B-instruct,70B,70B-instruct), or press Enter for all: 8B Downloading LICENSE and Acceptable Usage Policy --2024-07-25 16 06-- Resolving llama3-1.llamameta.net (llama3-1.llamameta.net)... 13.33.183.15, 13.33.183.99, 13.33.183.66, ... Connecting to llama3-1.llamameta.net (llama3-1.llamameta.net)|13.33.183.15|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2024-07-25 16 08 ERROR 403: Forbidden. --2024-07-25 16 08-- Reusing existing connection to llama3-1.llamameta.net:443. HTTP request sent, awaiting response... 403 Forbidden 2024-07-25 16 08 ERROR 403: Forbidden. DESKTOP-021DSF0 MINGW64 (main) $ ",2024-07-25T08:00:44Z,Juvarunst, 2429202855,2249707530,"> can you give try one more time with new URL and provide the request id? ",2024-07-25T08:02:49Z,Juvarunst, 2429202855,2250190347,same question ,2024-07-25T12:20:11Z,foo1s, 2429202855,2252106650,same issue here,2024-07-26T07:03:21Z,zhonghe0615, 2429202855,2252601560,"Same issue. -------- I resolved this. It was my error as I has cloned llama3. This is also clear from the prompt that offers llama3 models. So, resolution is to clone the correct repo that is then run the donwnload.sh from the llama3_1 folder. Download with url key worked perfectly.",2024-07-26T11:53:43Z,AnttiMJohansson, 2429202855,2254858160,"> ## Same issue. > I resolved this. It was my error as I has cloned llama3. This is also clear from the prompt that offers llama3 models. > > So, resolution is to clone the correct repo that is then run the donwnload.sh from the llama3_1 folder. Download with url key worked perfectly. I follwed as your steps, but still failed!! damon git clone Cloning into 'llama-models'... remote: Enumerating objects: 162, done. remote: Counting objects: 100% done. remote: Compressing objects: 100% done. remote: Total 162 (delta 47), reused 38 (delta 33), pack-reused 90 Receiving objects: 100% 1.31 MiB | 2.47 done. Resolving deltas: 100% done. damon cd damon ls CODE_OF_CONDUCT.md CONTRIBUTING.md docs Llama_Repo.jpeg MANIFEST.in models pyproject.toml README.md requirements.txt setup.py damon cd damon ls __init__.py llama2 llama3 llama3_1 damon cd damon ls api download.sh eval_details.md __init__.py LICENSE MODEL_CARD.md README.md requirements.txt USE_POLICY.md damon bash download.sh Enter the URL from email: **** Model list *** - meta-llama-3.1-405b - meta-llama-3.1-70b - meta-llama-3.1-8b - meta-llama-guard-3-8b - prompt-guard Choose the model to download: meta-llama-3.1-8b Selected model: meta-llama-3.1-8b **** Available models to download: *** - meta-llama-3.1-8b-instruct - meta-llama-3.1-8b Enter the list of models to download without spaces or press Enter for all: Downloading LICENSE and Acceptable Usage Policy --2024-07-29 11 01-- Resolving llama3-1.llamameta.net (llama3-1.llamameta.net)... 108.138.246.122, 108.138.246.20, 108.138.246.81, ... Connecting to llama3-1.llamameta.net (llama3-1.llamameta.net)|108.138.246.122|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2024-07-29 11 02 ERROR 403: Forbidden. damon ",2024-07-29T03:10:13Z,damonChenzf07, 2429202855,2255428498,"guys, i use aws to pull and in llama3_1 to run downloads.sh, and copy the url. It works. Next is to scp the downloaded checkpoint to my local machine. Which is a solution to avoid the 403 issue.",2024-07-29T09:22:04Z,YanJiaHuan, 2429202855,2268293275,"I use this method and it works. 1. Go to your llama folder (e.g. cd llama3) 2. List files (ls) 3. Remove download.sh (rm download.sh) 4. Go to 5. Copy data from download.sh (copy raw file) 6. Use nano to create download.sh (nano download.sh) 7. Paste data from memory (ctrl+v) 8. End nano and write to file (ctrl+x) 9. Add permision to download.sh (chmod +x download.sh) 10. Run download.sh Rest the same as before - paste link from meta email and next choice model: **** Model list *** meta-llama-3.1-405b meta-llama-3.1-70b meta-llama-3.1-8b meta-llama-guard-3-8b prompt-guard Choose the model to download: meta-llama-3.1-8b Selected model: meta-llama-3.1-8b **** Available models to download: *** meta-llama-3.1-8b-instruct meta-llama-3.1-8b",2024-08-05T06:42:22Z,Piotr-rogal, 2429202855,2282637466,"> are you on wsl or what? i had to run WSL2: - sudo apt update sudo apt install --reinstall net-tools - sudo apt install net-tools and in windows i ran: - netsh winsock reset then restart pc ***back in WSL2 - wsl ip addr - sudo rm sudo bash -c 'echo ""nameserver 8.8.8.8"" > sudo bash -c 'echo ""nameserver 8.8.4.4"" >> - sudo nano #Add the following lines to prevent auto-generation of resolv.conf: [network] generateResolvConf = false #then type the foloowing ctrl+x #it will say something like do you want to save before exiting and type y for yes n for no - y #hit enter #you should be done but test - wsl --shutdown wsl - ping 8.8.8.8 - ping google.com - at least that resolved my issue... i tried to update like sudo apt update and it was not downloading anything from the jammy jank. so i was like what the deuce?! ",2024-08-11T06:06:03Z,Max-Headspace, 2429202855,2379468106,"I'm getting this error. `Traceback (most recent call last): File """", line 198, in _run_module_as_main File """", line 88, in _run_code File line 7, in File line 44, in main parser.run(args) File line 38, in run args.func(args) File line 174, in run_download_cmd _meta_download(model, meta_url) File line 143, in _meta_download asyncio.run(downloader.download()) File line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File line 263, in download await self.get_file_info(client) File line 252, in get_file_info response.raise_for_status() File line 763, in raise_for_status raise HTTPStatusError(message, request=request, response=self) httpx.HTTPStatusError: Client error '403 Forbidden' for url ' For more information check: ",2024-09-27T14:54:43Z,rasibweb, 2429202855,2385248260,"hey bro i fixed it!after tried many times。 we need ensure the model we download is equal to the model you chose for example you cant download the llama3 by using the llama2 url or key anyways the model is a hug success for students like me who is interested in the language interacting ",2024-10-01T09:15:36Z,AUPU-bot, 2429202855,2385356663,"Got it. Thanks 😊 On Tue, Oct 1, 2024, 3:16 PM AUPU-bot *** wrote: > hey bro i fixed it!after tried many times。 > we need ensure the model we download is equal to the model you chose > for example you cant download the llama3 by using the llama2 url or key > anyways the model is a hug success for students like me who is interested > in the language interacting > > — > Reply to this email directly, view it on GitHub > < > or unsubscribe > < > . > You are receiving this because you commented.Message ID: > *** > ",2024-10-01T10:01:38Z,rasibweb, 2429202855,2508930633,"> I use this method and it works. > > 1. Go to your llama folder (e.g. cd llama3) > 2. List files (ls) > 3. Remove download.sh (rm download.sh) > 4. Go to > 5. Copy data from download.sh (copy raw file) > 6. Use nano to create download.sh (nano download.sh) > 7. Paste data from memory (ctrl+v) > 8. End nano and write to file (ctrl+x) > 9. Add permision to download.sh (chmod +x download.sh) > 10. Run download.sh > > Rest the same as before - paste link from meta email and next choice model: > > **** Model list *** > > ` > meta-llama-3.1-405b > meta-llama-3.1-70b > meta-llama-3.1-8b > meta-llama-guard-3-8b > prompt-guard > Choose the model to download: meta-llama-3.1-8b > ` > > Selected model: meta-llama-3.1-8b > > **** Available models to download: *** > > ` > meta-llama-3.1-8b-instruct > meta-llama-3.1-8b > ` I tried this way it still shows the same error 403 Forbidden",2024-11-30T11:31:53Z,Wang-Ying-Yi, 2429122726,2249551038,Huh,2024-07-25T06:24:04Z,Walkingcaffeine, 2428904338,2249670385,Can you paste the logs that shows? I can try and see what is going on.,2024-07-25T07:42:50Z,samuelselvan, 2428904338,2250325610,"I have the same problem. I specify the model I wish to download, and then it displays 'Downloading LICENSE and Acceptable Usage Policy' before abruptly quitting bash. ",2024-07-25T13:28:40Z,bmillns-d, 2428875434,2249609047,"Hello Might be resolved if you upgrade to transformers 4.43.2, see for reference.",2024-07-25T07:06:35Z,pcuenca, 2428875434,2249707068,"Thanks I tried transformers 4.43.2 but same error occurs again... I read over the fix on transformers, it only add a check on if torch has dtype fp8_e4m3fn, the fix would work for torch <2.1 but not for 2.4 I guess, 2.4.0 should have the fp8 dtype already. seems like it somehow passed the hasattr(torch, 'float8_e4m3fn') check but torch still couldnt find Float8_e4m3fnStorage... let me check if there might be some error on my pytorch version?",2024-07-25T08:02:34Z,Corsky, 2428875434,2249717074,"I assume you are running on FP8 hardware (H100), right? (I believe you'd get a different error if you are not). Other than that, I'm not sure if you'd need to upgrade to cuda 12.",2024-07-25T08:07:36Z,pcuenca, 2428875434,2249729077," I'm using H800 now, it do support FP8 I think, and yes the error is different when I tried on A100 at the first time. Let me try if it works on cuda 12",2024-07-25T08:13:10Z,Corsky, 2428544051,2255748016,I think this is a mistake in the paper too~,2024-07-29T12:03:00Z,ykddd, 2428544051,2257785773,any findings?,2024-07-30T08:34:02Z,YanJiaHuan, 2428544051,2268114759,"As defined in hidden_dim is initialized to 4h,and then determined by ffn_dim_multiplier and multiple_of ",2024-08-05T03:45:05Z,ykddd, 2428526322,2248951251,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-07-24T21:43:56Z,facebook-github-bot, 2428089405,2260994509,"Please refer to the download script for 3.1 here I'll include a note on the script, thanks for flagging it!",2024-07-31T17:16:32Z,subramen, 2427100186,2249299956,"> What are the memory footprints (GB) of > > * Llama-3.1-8B > > * Llama-3.1-70B > > * Llama-3.1-405B > > * Llama-3-8B > > * Llama-3-70B > > > models and hardware specifications required to run the models? personal PC use 8B.,memory needs 16gb-32gb. enterprise computer use 70B, 405B",2024-07-25T03:39:00Z,CodeMagic6, 2426441060,2258883077,Hi! Thanks for your question! We used our internal eval implementation to generate those metrics instead of relying on the public lm_evaluation_harness library. Here is a summary of our lm_evaluation_harness and we also published the evaluation result details as datasets in the lm_evaluation_harness Hugging Face collections for you to review. I believe lm_evaluation_harness may help you. ,2024-07-30T17:44:58Z,wukaixingxp, 2426441060,2259621166,"This is super helpful! Thank you, Kai! ",2024-07-31T04:29:42Z,jasonkrone, 2426059352,2246333914,"I was able to download them from the llama-models repo (after also getting an error when trying the download.sh from llama3): ",2024-07-23T21:17:01Z,Quasimondo, 2426059352,2247107739,"> I was able to download them from the llama-models repo (after also getting an error when trying the download.sh from llama3): > > I will try that, thanks 😃 ",2024-07-24T07:32:15Z,numoh, 2416404095,2260986815,I'm not aware of such a constraint. Can you share more details on how this impacts your work?,2024-07-31T17:11:41Z,subramen, 2414857912,2234907916,"> Hi, eval_details.md says that MATH is evaluated with maj Does maj means the majority class accuracy That really confuses me as there is are so many classes in MATH, and calculating the major class does not seem meaningful. Can you give a clearer explanation on the evaluation metric? Also, how do you judge the correctness of a response in MATH? Do you use the evaluation codes in the MATH repo?",2024-07-18T00:32:42Z,NagisaZj, 2412261109,2232038726,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-07-17T00:05:44Z,facebook-github-bot, 2406536907,2228249983,"> ",2024-07-15T11:11:17Z,Amir231123, 2399688879,2219664461,Hi in your torchrun call you need to specify the --nproc_per_node to your number of GPU. It will spin up a process for each GPU to split the model.,2024-07-10T06:29:07Z,mreso, 2399688879,2220350375,"The same problem, when I set the --nproc_per_node to 8, it will get an error Loading a checkpoint for MP=1 but world size is 8"".",2024-07-10T12:11:39Z,ISADORAyt, 2399688879,2220394990,"> Hi in your torchrun call you need to specify the --nproc_per_node to your number of GPU. It will spin up a process for each GPU to split the model. Yes, I have tried that but it will output the assertion failure exactly the same in another comment. I think that the problem is due to Llama3-8B-Instruct only has one checkpoint file? So how does set nproc_per_node will help, or more specifically, how can we solve this? Thank you!",2024-07-10T12:32:11Z,DerrickYLJ, 2399688879,2220395503,Sorry wasn't paying attention that was loading the 8B model. The code in this repo is only able to load the 8B on a single GPU and the 70B model on 8 GPUs. To run different splits you'll need to look into different engine like vllm which you can either run standalone or through TorchServe's integration ,2024-07-10T12:32:29Z,mreso, 2399688879,2220399096,"> I think that the problem is due to Llama3-8B-Instruct only has one checkpoint file? So how does set nproc_per_node will help, or more specifically, how can we solve this? Please see above, I misread your initial post. ",2024-07-10T12:34:26Z,mreso, 2399688879,2290510231,"Same issue! Could you please tell me how you solved this problem? I have 4 GPUs. Is that true that this repo code is only able to load the 8B on a single GPU, not any else numbers ,like 4? Thank you so much!",2024-08-15T03:46:17Z,Lululzz7, 2399688879,2290511434," > Sorry wasn't paying attention that was loading the 8B model. The code in this repo is only able to load the 8B on a single GPU and the 70B model on 8 GPUs. To run different splits you'll need to look into different engine like vllm which you can either run standalone or through TorchServe's integration Same issue! Is that true that this repo code is only able to load the 8B on a single GPU, not any else numbers ,like 4? Thank you so much! Is there other ways to cope with this problem?",2024-08-15T03:47:21Z,Lululzz7, 2399688879,2635051615,im having the same issue any updates here? ,2025-02-04T20:53:07Z,Stephnn0, 2396752838,2217300624, What implementation are you using? I would suggest asking in the library specific instead. We do not provide a implementation in this repo. Feel free to reopen if there are further questions.,2024-07-09T10:42:40Z,mreso, 2396752838,2218679537,"No this happens with just Automodelforcasuallm loading. even before processing peft.",2024-07-09T20:33:12Z,abpani, 2396752838,2219674612,Hi AutoModel is a class. The transformer implementation in this repo is separate from it. You will most likely find help here: ,2024-07-10T06:36:28Z,mreso, 2393116559,2240074695,"> Srart",2024-07-19T20:41:59Z,Amir231123, 2391361130,2245380218,I am also facing the same issue.,2024-07-23T14:15:58Z,SujitJustineBarwa, 2390030889,2219955400,"Hi historical tokens are represented by the kv cache state, see: ",2024-07-10T09:02:28Z,mreso, 2390030889,2220056575,"> Hi historical tokens are represented by the kv cache state, see: > > Hi , thanks for your reply. If I am not using cache, can I simply do this? Thanks",2024-07-10T09:52:46Z,ZeroAGI, 2390030889,2220070743,"Not sure why you would want to deactivate kv caching. It is a technique to accelerate token generation in transformer models. A great explanation of this technique can e.g. be found in this blog post ",2024-07-10T09:59:46Z,mreso, 2387566155,2204997584,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-07-03T03:18:32Z,facebook-github-bot, 2385576036,2202286142, ,2024-07-02T08:20:00Z,JethroChow, 2385576036,2206873676," Thanks for your effort and time but we are not accepting PRs to add or modify the functionality of the model. Moreover, since Llama is a decoder-only model I'm not sure how effective these embeddings are in practice. ",2024-07-03T17:35:20Z,subramen, 2379702868,2206808184,"Hi, can you try making your system prompt more explicit on how much the model should respond? When you say ""act"", it probably direct the LLM to act out a whole script. I would try something like sys: ""you are an experienced.... Your responses must be not more than one sentence"". ",2024-07-03T17:04:23Z,subramen, 2379702868,2207010026,"> Hi, can you try making your system prompt more explicit on how much the model should respond? When you say ""act"", it probably direct the LLM to act out a whole script. I would try something like sys: ""you are an experienced.... Your responses must be not more than one sentence"". I tried but nothing works in my side. Can you please share your code so that I may get help from it?",2024-07-03T19:09:52Z,fahim9778, 2371720232,2206854399,cc - can you comment here?,2024-07-03T17:22:41Z,jspisak, 2371720232,2262602668,"This same issue applies to Llama 3.1 8B Instruct. See: Relevant comment: > The other thing we noticed when discussing these results with the Meta team is that the instruction tuning of the model makes it ignore in context learning: it's no longer able to follow the minerva answer format, hence why most answers count as false. _Originally posted by in ",2024-08-01T09:39:13Z,JCRPaquin, 2371720232,2344224608," i also cannot reproduce llama3.1-8B on from 0 to 16 shots. 0shot gives 0.33 accuracy but more shots hurt. I do not know why.",2024-09-11T17:04:33Z,yananchen1989, 2368667400,2185094804,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-06-23T16:04:05Z,facebook-github-bot, 2368667400,2185156576,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-06-23T17:04:10Z,facebook-github-bot, 2367556916,2206811854,"Cannot tell what the issue is without more details. Closing this now, please open a new issue following the provided issue template including information about your runtime platform. ",2024-07-03T17:06:42Z,subramen, 2367556916,2315491634,How was it solved? ,2024-08-28T14:26:53Z,eccstartup, 2365845044,2206813978,You should be able to seamlessly switch if using transformers. Please share the code you're running,2024-07-03T17:08:04Z,subramen, 2365845044,2207974105,"> You should be able to seamlessly switch if using transformers. Please share the code you're running Hello and thank you for replying. The code went wrong when loading the llama And the llama3-8B I use is the version downloaded from meta website. Is that possible something went wrong when downloading llama?",2024-07-04T03:15:19Z,Summoningg, 2365845044,2260984218,The code snippet you shared should work. Can you confirm what is the value of args.llama_model? It should be something like if you are using the HF api,2024-07-31T17:10:03Z,subramen, 2365845044,2568627239,"> The code snippet you shared should work. Can you confirm what is the value of args.llama_model? It should be something like if you are using the HF api I have same question. In my code, the value of args.llama_model is the model path I downloaded ",2025-01-03T02:41:19Z,Tzx11, 2363628197,2179965628,"I remove from in and the problem seems to be solved. ",2024-06-20T07:05:29Z,YueChenkkk, 2363384302,2184110559,"Sorry for the late reply in continuation from #242; typed up an answer and the page refreshed and got rid of my response *facepalm*. I suggest finding some non-CUDA dependent code and figuring out how to ""cut"" CUDA out of this codebase if you really want to use this code. Here are a few good resources that may help: - May have some conditional code to check if CUDA is available or not. - Same idea, may be able to find out where the inference is occurring and MAYBE even use - If you really want to go deep, look into the source code for llama.cpp, Ollama, or specifically the LangChain source code and see what happens when you set AND Unfortunately even state-of-the-art models are very finicky and not *too* well documented compared to normal software projects. I've had to dig around in source code for hours for this type of stuff lol. Good luck!",2024-06-22T17:05:17Z,JeffreyLind3, 2363384302,2185432779,"> Sorry for the late reply in continuation from #242; typed up an answer and the page refreshed and got rid of my response _facepalm_. I suggest finding some non-CUDA dependent code and figuring out how to ""cut"" CUDA out of this codebase if you really want to use this code. Here are a few good resources that may help: > > * May have some conditional code to check if CUDA is available or not. > * Same idea, may be able to find out where the inference is occurring and MAYBE even use > * If you really want to go deep, look into the source code for llama.cpp, Ollama, or specifically the LangChain source code and see what happens when you set AND > > Unfortunately even state-of-the-art models are very finicky and not _too_ well documented compared to normal software projects. I've had to dig around in source code for hours for this type of stuff lol. Good luck! > Sorry for the late reply in continuation from #242; typed up an answer and the page refreshed and got rid of my response _facepalm_. I suggest finding some non-CUDA dependent code and figuring out how to ""cut"" CUDA out of this codebase if you really want to use this code. Here are a few good resources that may help: > > * May have some conditional code to check if CUDA is available or not. > * Same idea, may be able to find out where the inference is occurring and MAYBE even use > * If you really want to go deep, look into the source code for llama.cpp, Ollama, or specifically the LangChain source code and see what happens when you set AND > > Unfortunately even state-of-the-art models are very finicky and not _too_ well documented compared to normal software projects. I've had to dig around in source code for hours for this type of stuff lol. Good luck! Thank you bro. This change is difficult, my boss has equipped a new desktop computer for work (not just for me, but for the company, and I am currently using it). Your suggestions and links are very helpful to me, I have read those references carefully and plan to try to understand and use these knowledge about the framework in my free time.",2024-06-24T01:41:22Z,12dc32d, 2362900841,2179253919,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-06-19T18:00:36Z,facebook-github-bot, 2358440318,2215492931,Hi have you figured out the solution? I have the same question! Thank you!,2024-07-08T23:02:59Z,KexinGAO42, 2352832951,2178163194,"same with you ",2024-06-19T09:08:17Z,easybrad, 2352832951,2179656175,"If you could add more information about your problem, I may be able to help - I had a 'couldn't find path' issue today I had to debug",2024-06-20T01:31:20Z,JeffreyLind3, 2352832951,2179695683,"> If you could add more information about your problem, I may be able to help - I had a 'couldn't find path' issue today I had to debug Thank you bro. I already find the Original file and here has tokenizer model and checkpoint file. But the one more question is my tablet uses Intel integrated graphics, it can not support cuda (only nvdia can) driver. I downloaded the intel graphics driver but I don't know how to modify the code to make the llama3 model calculate on the intel graphics card. Every time it shows that cuda is required. May share any ideas as you want?",2024-06-20T02:25:12Z,12dc32d, 2352832951,2179697902,"> same with you You may click and download the first file. It is original file include tokenizer_model, other 2 files are checkpoint_model.",2024-06-20T02:26:55Z,12dc32d, 2352832951,2180452157,"> > If you could add more information about your problem, I may be able to help - I had a 'couldn't find path' issue today I had to debug > > Thank you bro. I already find the Original file and here has tokenizer model and checkpoint file. But the one more question is my tablet uses Intel integrated graphics, it can not support cuda (only nvdia can) driver. I downloaded the intel graphics driver but I don't know how to modify the code to make the llama3 model calculate on the intel graphics card. Every time it shows that cuda is required. May share any ideas as you want? This link may be helpful, and could adapt to this repository...unfortunately I'm not an expert with this stuff lol. With integrated graphics though, you may want to explore solutions like llama.cpp or ollama, as they are better at adapting to hardware, especially non-specialist graphics units.",2024-06-20T11:31:49Z,JeffreyLind3, 2352832951,2181839920,"Okay....thanks. My Gpu is Intel(R) UHD Graphics, this is an iGPU, all of videos resource about Intel gpu are dGPU (The famous one is Intel Arc). The Intel Arc series is Intel's recently launched high-performance graphics card series, designed to compete with NVIDIA and AMD's discrete graphics cards. Actually, what I want to ask is how to modify the llama3 code so that it can run on Intel GPU. Although I have disabled cuda on PyCharm using the statement [os.environ[""CUDA_VISIBLE_DEVICES""] = """"], every time running, it prompts me that cuda is not available, instead of considering using Intel GPU. If you have solved similar problems before or have any ideas, please have a pleasant communication. ",2024-06-21T01:55:54Z,12dc32d, 2352832951,2181840297,"Okay....thanks. My Gpu is Intel(R) UHD Graphics, this is an iGPU, all of videos resource about Intel gpu are dGPU (The famous one is Intel Arc). The Intel Arc series is Intel's recently launched high-performance graphics card series, designed to compete with NVIDIA and AMD's discrete graphics cards. Actually, what I want to ask is how to modify the llama3 code so that it can run on Intel GPU. Although I have disabled cuda on PyCharm using the statement [os.environ[""CUDA_VISIBLE_DEVICES""] = """"], every time running, it prompts me that cuda is not available, instead of considering using Intel GPU. If you have solved similar problems before or have any ideas, please have a pleasant communication. ",2024-06-21T01:56:28Z,12dc32d, 2342530391,2195209092,Maybe could be a useful reference?,2024-06-27T16:52:02Z,awgu, 2342530391,2206882198,"+1 to torchtitan. We also have a guide here Closing this issue, please reopen if you need any clarification!",2024-07-03T17:41:24Z,subramen, 2336982395,2183584319,I am getting forbidden as well,2024-06-21T23:41:36Z,Eyesun23, 2336982395,2187324834, are you using any proxy any chance?,2024-06-24T20:13:03Z,samuelselvan, 2336447375,2150651989,"Hi thanks for you question! Based on how BPE tokenizer work, that would be the case. Still, depending on what you are trying to implement, it's important to note that special tokens and out of vocabulary tokens don't follow this rule and there might be additional edge cases you would need to consider. Thanks,",2024-06-05T18:04:45Z,albertodepaola, 2336447375,2152047532,Thanks . Appreciate your quick response. ,2024-06-06T11:06:56Z,spookyQubit, 2335765043,2150593521,"Hi what would be the issues you are seeing? With some modifications the code in this repository could work on windows natively, but the code is made to work on Linux. You can use WSL on windows to execute this code as well. Thanks for asking and feel free to provide additional information.",2024-06-05T17:30:32Z,albertodepaola, 2335765043,2151565961,"> Hi what would be the issues you are seeing? With some modifications the code in this repository could work on windows natively, but the code is made to work on Linux. You can use WSL on windows to execute this code as well. Thanks for asking and feel free to provide additional information. Hi I managed to install on Windows, however my system performed slowly since I could not utilize the power of two GPUs; only one GPU was handling the load. Installing and utilizing Llama 2 70 B, Llama 3 70 B, LLaMA 2 30 B (FP16), or lesser sizes that would function flawlessly on my machine, is the major goal. I recently installed WSL; could you please walk me through the process?",2024-06-06T07:13:14Z,kirushake, 2335765043,2229209389,Hey can you try using it with Olamma ??? ,2024-07-15T19:15:56Z,ajayspatil7, 2335099440,2150528648,"I suggest asking this on the ollama repo. I can see they have some documentation here Closing this issue as it is unrelated to the meta-llama repo",2024-06-05T16:54:00Z,subramen, 2334348111,2150632537,"Generally LLMs including Llama are not precise on numerical data and arithmetic, and so any numerical analysis is susceptible to hallucinations. ",2024-06-05T17:54:08Z,subramen, 2334348111,2151879274,Thank you.,2024-06-06T09:55:15Z,mzeesam, 2332352722,2146408899,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-06-04T01:44:48Z,facebook-github-bot, 2332352722,2162876734,the agreement...,2024-06-12T12:24:06Z,lluisagusti, 2331489313,2150634532,"Hi, when you download the weights using , it downloads the files you have shown. Since you are looking for , are you trying to use the weights with Hugging Face APIs? If yes, you will need to convert the weights to HF format by using use the script. We have an example notebook on our Llama-recipes Github Repo that you could refer to that showcases how to use already converted weights as well as converting them yourself. You can also check out how to example notebook.",2024-06-05T17:55:25Z,fbnav, 2331489313,2151988524," , Thank you for your hints, I tried to convert by using convert_llama_weights_to_hf.py, but got a new error. RuntimeError: Internal: could not parse ModelProto from ",2024-06-06T10:49:26Z,UserName-wang, 2331489313,2151990215," , Thank you for your hints, I tried to convert by using convert_llama_weights_to_hf.py, but got a new error. RuntimeError: Internal: could not parse ModelProto from ",2024-06-06T10:49:56Z,UserName-wang, 2331489313,2184131330,"I tried successful using ""python --input_di Meta-Llama-3-8B-Instruct --model_size 8B --output_dir output --llama_version 3"" ",2024-06-22T17:57:05Z,qianzhouyi2, 2329645503,2143844467,"the error information is : RuntimeError: Internal: [model_proto->ParseFromArray(serialized.data(), serialized.size())] It means maybe the tokenizer.model is not ok",2024-06-02T13:01:55Z,liutao053877, 2329645503,2151301115,你好,请问解决这个问题了吗,我也报了同样的错误,2024-06-06T02:26:33Z,XiangwenXiao, 2329645503,2194134496,+1,2024-06-27T08:48:00Z,baiyuting, 2329429090,2143668006,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-06-02T02:10:42Z,facebook-github-bot, 2329300874,2150628574,"You might want to ask on the transformers repo as this is specific to their API. I thought would have a arg but I don't see it on their docs. You could use the arg like cc for HF expertise",2024-06-05T17:51:33Z,subramen, 2329300874,2155264743,Thanks for the reply. Will Look into the transformers library,2024-06-07T17:44:37Z,Acejoy, 2329300874,2162690775,"Setting is the right approach Just make sure you are using the right token as the tokenizer expects it :) (e.g. spaces at beginning, etc)",2024-06-12T10:45:12Z,osanseviero, 2329300874,2163094954,"> Setting is the right approach 👍 Just make sure you are using the right token as the tokenizer expects it :) (e.g. spaces at beginning, etc) Could you give an example?(specifically for I tried the same, but was not successful. Thanks ",2024-06-12T14:03:42Z,Acejoy, 2328794417,2146277056,(This also reproduces uses tiktoken.),2024-06-03T23:11:49Z,josharian, 2322394772,2138008525,"Hi, this might be due to vllm applying the prompt template on top of your templated message. Can you try using the Chat API ( instead?",2024-05-29T18:20:06Z,subramen, 2322394772,2138597875,"> Hi, this might be due to vllm applying the prompt template on top of your templated message. Can you try using the Chat API ( instead? i use the Chat API and choose the tail of the sentence.It still produce endless repete response.Although this can be solved by define the teriminated token.But the model always hide the more correct anwser after(like my example, it response 15 first and after it realize the logistic mistake and respond with the correct answer).This is very strange. Any way to solve it? ",2024-05-30T03:16:08Z,JJplane, 2322394772,2163217321,"Yes, passing <|eot_id|> as the termination token will prevent the repetition. If 15 is not the expected response, perhaps you might get a more accurate response if in the prompt you direct the LLM to use chain-of-thought or to think step-by-step before generating the answer?",2024-06-12T14:45:19Z,subramen, 2322087643,2136278675,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-05-28T23:49:56Z,facebook-github-bot, 2322087643,2136292569,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-05-29T00:06:39Z,facebook-github-bot, 2322087643,2140991336,cc since I see you were previously involved in the packaging efforts 🙂 ,2024-05-30T23:17:25Z,ofek, 2319780209,2138154388,"Hi, it looks like you are using huggingface to run the model. I was not able to reproduce this issue, could you provide the entrie script? Could you also try using the model with Transformers pipeline and see if you get a similar result? For reference, you can also check out how to run our model using HF on our Getting Started Guide. We also have an run our model using HF on our Getting Started Guide on our Llama-recipes Github Repo that you could refer to that showcases how to use already converted weights as well as converting them yourself.",2024-05-29T19:51:31Z,fbnav, 2319780209,2138353050,"I downloaded the files config json config from huggingface using the available Meta-Llama repo ! I saved all the files under llama-3-Instruct folder. This is the test using the pipiline method code ! I got the same output. By the way, I tested the available shared method """" un our model using HF on our Getting Started Guide. """" but the output is the question copied N times ... This is the code : ",2024-05-29T22:19:59Z,feki-younes, 2319780209,2138403349,"Thank you for providing more information. Are you using already converted HF weights or the original weights? To use already converted weights, you can try updating the model name to and re-try. Setting the model id to this will use the HF converted Meta-Llama-3-8B-Instruct model to run this example. any folder you may have created and called in this directory before running. Instead, if you downloaded the original weights and saved them in your folder, you'll need to convert them to HF format using the script and update to point to the path of the converted weights. You can find steps to do this in our example notebook.",2024-05-29T23:15:45Z,fbnav, 2319780209,2141698840,"Well thank you very much for this help ! I think the worse idea was to take all config from HF. The model is stopping generation now. Thank you ! 🥇 For anyone facing the same problems : Try the second method : converting localy the weights ! it worked for me ",2024-05-31T10:11:05Z,feki-younes, 2318167775,2136738687,Same rejected. China?,2024-05-29T07:36:54Z,eanson023, 2318167775,2137711715,Same rejected with no further explainations.,2024-05-29T15:36:22Z,WangYihang, 2318167775,2249919367, Same for me when apply Llama 3 permission. Please help us deal with this problem. My HF account is : **Henry65**. Thank you in anticipation!,2024-07-25T09:45:03Z,HenryStephen, 2316763046,2131662039,"Are you still having this error? If yes – can you attach a screenshot of the stack trace, please",2024-05-25T23:22:01Z,jxtngx, 2316625377,2130704837,my rust version is 1.78.0,2024-05-25T03:01:05Z,YENpsychopomp, 2316625377,2130738991,"problem solved Just install c++",2024-05-25T04:00:27Z,YENpsychopomp, 2315863693,2143631075,"The context length can be increased by changing the sequence length parameter. However, the pretrained model is trained using text upto 8k tokens, just increase the context length will generate poor results. To generalize beyond 8k, further fine tuning with longer texts are necessary. There are documented steps to increase to >1M tokens such as this article, which also involves increasing the rope theta while increasing length of the training sequence. This example training article might also be helpful.",2024-06-01T23:43:15Z,dongwang218, 2315863693,2144151345,"> The context length can be increased by changing the sequence length parameter. However, the pretrained model is trained using text upto 8k tokens, just increase the context length will generate poor results. To generalize beyond 8k, further fine tuning with longer texts are necessary. There are documented steps to increase to >1M tokens such as this article, which also involves increasing the rope theta while increasing length of the training sequence. This example training article might also be helpful. Thanks for answering! That's pretty helpful for me!",2024-06-03T02:07:24Z,ANYMS-A, 2312515578,2143705658,"Sequence packing concatenate multiple short sequences into a single long sequence to improve training and inference efficiency. A block-diagnoal mask is applied to the self attention to prevent attention between different sequences. Within each sequence, the tokens should following the template. For instruction following, LLaMA3's template is explained well in the blog post. For the text completion base model, no such format is required. In both cases, the same attention mask should be applied independently whenever there are packed sequences.",2024-06-02T05:31:45Z,dongwang218, 2312515578,2184194230,"closing the issue, feel free to reopen if necessary.",2024-06-22T21:26:10Z,dongwang218, 2312515578,2330708724,"> Sequence packing concatenate multiple short sequences into a single long sequence to improve training and inference efficiency. A block-diagnoal mask is applied to the self attention to prevent attention between different sequences. Within each sequence, the tokens should following the template. For instruction following, LLaMA3's template is explained well in the blog post. For the text completion base model, no such format is required. In both cases, the same attention mask should be applied independently whenever there are packed sequences. Do you reset position ids between different short sequences?",2024-09-05T06:32:01Z,sz128, 2312515578,2339443685,"In general, positions need to be per sequence. For RoPE, it is not necessary, as the position is not absolute, but relative positions are used. More details can be found in Appendix C.1 of this arxiv paper.",2024-09-10T01:41:28Z,dongwang218, 2311957538,2128353129, I saw a guy already come up with llama3 implementation from scratch repo lately. Do you want it? I can give you a source of that repo 👍🏻 ,2024-05-24T02:04:29Z,pavaris-pm, 2311957538,2131147305,"> Anyone new to llama3 and want to build from scratch ,, here i am also..Knock me ,we can work together. Can I work with you?",2024-05-25T08:59:33Z,Drew19980118, 2311957538,2139110448," Can you send the repo for llama3 implementation from scratch ",2024-05-30T09:10:41Z,paneer24, 2311957538,2169075904,How to get started?,2024-06-15T03:15:40Z,bdqnaccphantianyang, 2309692166,2124004593," ",2024-05-22T06:53:19Z,LJ-Hao, 2309692166,2131656226,"If you get the model directly from HF Hub, then it will be in HF format. Here are the Llama 3 models: ",2024-05-25T23:18:21Z,jxtngx, 2309692166,2137953035,"Hi, you could get the models directly from HF which will already be in the HF format and use them directly. You could also get the weights from our website and convert them to HF format using the script. Feel free to check out how to directly from HF. We also have an directly from HF on our Llama-recipes Github Repo that you could refer to that showcases how to use already converted weights as well as converting them yourself. ",2024-05-29T17:47:06Z,fbnav, 2307142813,2122349524,"have you checked minimum required CUDA version NVIDIA driver version for latest ggml? also you can check ggml, and llama.cpp repos for more help on this issue * * ",2024-05-21T10:48:53Z,M-Ali-ML, 2307142813,2575695637," Please did you end up resolving the problem , i have the exact same problem and I almost tried everything , nothing seems to work ! ",2025-01-07T16:15:54Z,Oussamayousre, 2306909435,2121326940,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-05-20T22:33:45Z,facebook-github-bot, 2306909435,2121360732,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-05-20T23:04:29Z,facebook-github-bot, 2305810524,2128570192,"Same for me: thenicealex email: thenicealex4@gmail.com",2024-05-24T05:38:40Z,thenicealex, 2305810524,2143228032,"same here, anyone knows why?",2024-06-01T02:34:19Z,yangxile, 2305810524,2249917426, Same for me when apply Llama 3 permission. Please help us deal with this problem. My HF account is : **Henry65**. Thank you in anticipation!,2024-07-25T09:43:58Z,HenryStephen, 2304710355,2137933086,Are you using a checkpoint from GPTQ or AWQ? You might want to load it using those scripts ,2024-05-29T17:34:32Z,subramen, 2304710355,2241733100,"Its gptq... I didnt quantize the model, its in one of Huggingface repo.. ",2024-07-21T18:18:47Z,puja93, 2298368913,2113072665,"Hi seems like seq in powershell does not support the -f parameter. To get you unblocked you can locally change your download.sh file by replacing [lined 45 against this: ",2024-05-15T17:22:24Z,mreso, 2298368913,2113905534,"Hello , now the 8b_pre_trained model is downloading. Thank you so much for your help!",2024-05-16T02:37:04Z,Arwindhraj, 2296674709,2113018745,"Hello For Transformers-based fine-tuning you can follow the steps in our Hugging Face blog post, which relies on the library, or you can use the general-purpose Hugging Face blog post. In both cases you'll need to use Hugging Face blog post.",2024-05-15T16:48:28Z,pcuenca, 2295711025,2111880922,I also encountered this problem 。This is because some syntax of wget is not compatible on Ubuntu。Try using download.sh This issue does not occur,2024-05-15T08:24:11Z,a492557688, 2295711025,2425237210,I am trying to complete an Arm Learning Path: which is broken at the place where a download.sh script is expected. How can I properly download lllama 3.2 and continue with this procedure?,2024-10-20T21:42:20Z,hybotix, 2294590106,2110463798,"Thanks for reporting this, you are welcome to open github issues to report any other bugs you may encounter :) Multilingual support is coming soon for Llama - although it can recognize words from different languages it hasn't been specifically finetuned to support them (yet).",2024-05-14T14:57:15Z,subramen, 2294590106,2122235248," thanks for your info, just a quick a couple of question regarding Multilingual support; * Will that means the model will be on non-English data? * Are there an ETA of such a model?",2024-05-21T09:53:58Z,mohblnk, 2294590106,2137943362,"No ETA yet, but please stay tuned for updates.",2024-05-29T17:41:02Z,subramen, 2294531360,2109372228,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-05-14T06:18:39Z,facebook-github-bot, 2294531360,2109434111,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-05-14T07:04:03Z,facebook-github-bot, 2293823361,2108908846,"Hi, can you verify that you're using the same env when running torchrun that you used to run ?",2024-05-13T22:28:06Z,mreso, 2293823361,2108927364,"> Hi, can you verify that you're using the same env when running torchrun that you used to run ? i am using venv but i got same error, can you please help me how can i verify that i am using the same env ? so i can post my result here.",2024-05-13T22:44:29Z,Salman-Malik1, 2293823361,2108940925,"> Hi, can you verify that you're using the same env when running torchrun that you used to run ? i tried in both venv python2.7 and 3.9 but same error. ",2024-05-13T22:56:36Z,Salman-Malik1, 2293823361,2108944092,"Can you post the outputs of: and ",2024-05-13T23:00:20Z,mreso, 2293823361,2108948485,"python -m pip list torchrun --nnodes 1 -m pip list ",2024-05-13T23:04:31Z,Salman-Malik1, 2293823361,2108985969,"What jumps out to me is that you've packages llama and llama3 installed. And llama points to a version 3 folder, though the name llama is connected to llama 2 for pip installations. Easiest would be to try a new env and only install llama once either using or .",2024-05-13T23:23:59Z,mreso, 2293823361,2109008396,"> What jumps out to me is that you've packages llama and llama3 installed. And llama points to a version 3 folder, though the name llama is connected to llama 2 for pip installations. Easiest would be to try a new env and only install llama once either using or . when i run this command it install both itself llama and llama3.",2024-05-13T23:45:08Z,Salman-Malik1, 2293823361,2109050361,"> What jumps out to me is that you've packages llama and llama3 installed. And llama points to a version 3 folder, though the name llama is connected to llama 2 for pip installations. Easiest would be to try a new env and only install llama once either using or . getting same error :( ",2024-05-14T00:26:47Z,Salman-Malik1, 2293823361,2111115525,"> What jumps out to me is that you've packages llama and llama3 installed. And llama points to a version 3 folder, though the name llama is connected to llama 2 for pip installations. Easiest would be to try a new env and only install llama once either using or . I am currently facing a critical issue that requires immediate attention and I believe your expertise would be invaluable in resolving it. Given the urgency, I am prepared to provide you with all necessary access credentials securely and I'm willing to compensate you for your prompt and professional service. it is impacting our operations significantly. Please let me know your availability at your earliest convenience, and your terms for such urgent tasks. You can reach me directly at this email (salman.malik@onboardsoft.com).",2024-05-14T20:49:04Z,Salman-Malik1, 2293823361,2274927513,"I have the same issue as the author. Did anyone find a solution? ",2024-08-08T04:27:50Z,magdalenapasternak, 2291025413,2110476267,"The error is probably related to the init_method arg you have passed... why are you passing that in? Ensure your machine has 8 GPUs as that is a requirement for 70B. If not, then you can use HF to load the 70B model. Running on windows is possible with gloo, pls take a look at for how they did it. ",2024-05-14T15:02:35Z,subramen, 2290719303,2110486008,"The 4k refers to the dimensionality of the model's layers, not the context length. Both models have a similar architecture. For more info on the differences, take a look at ",2024-05-14T15:06:36Z,subramen, 2289665071,2106025651,"So, did META just change the model card page after my github issue, completely ignoring this issue? :) ",2024-05-11T20:38:05Z,Sneakr, 2289665071,2110526155,"> However, having no system message string present but still include the system token, results in a completely different output compared to having no system token at all. Are you referring to a case where you pass the system header but no system_prompt, i.e. Getting a different output is expected behavior because the template is sensitive to the header; the model is expecting a system message but it is getting an empty string. If you don't have a system message it is better to not include the system header. This is how we encode dialogs I don't think the changes to the model-card are related to this issue, but we'd appreciate your suggestions to improve its clarity :) cc ",2024-05-14T15:25:21Z,subramen, 2289665071,2111254594," Thanks for your response. Yes, that's what I'm referring to. > Getting a different output is expected behavior because the template is sensitive to the header; the model is expecting a system message but it is getting an empty string. It is indeed expected behavior, as the input becomes is different, the output would be different. However the question is which output is the expected one by the author of the model and the training process. As per my findings, If the model has been trained with system headers present (in my case fine tuned): And later inferenced as per the tokenizer.py you referenced **Conclusion**: It produces a different output which breaks the behaviour of the training progress and the training data - if the system headers are not present as they were during the training process. > If you don't have a system message it is better to not include the system header. This is how we encode dialogs 1: Why would it not be included if it was trained with a system header? Wouldn't it be logical to assume that your outputs during training is the one we should expect during inference, and therefore keep the system headers as is regardless of an empty system message or not? 2: What makes you conclude that it is better to leave out the system message? We have 2 different outputs, how do we come to that conclusion that one output (without system headers) would be better than the other (with system headers)? In my tests, the opposite is true, especially during tuning and training, leaving out tokens that were present during training would break the expected output. I'm grateful for clarification and your response! :) _In regards to the model card page, it is something only one can speculate and only the author of the page knows the reason for the changes, it is peculiar however that my quoted wordings were completely removed just a day after my issue here. But no clarification shined on this thread. But let's leave that aside and focus on the issue at hand._ ",2024-05-14T22:37:13Z,Sneakr, 2289665071,2113022448,"My response is based on the assumption that the model was NOT finetuned with a system header & null system prompt ie. So i would not expect it to give good results. If you are getting better results with a null prompt, that's interesting - if you can share it, please DM me on twitter (same handle as github username).",2024-05-15T16:50:51Z,subramen, 2289665071,2113072200,"> My response is based on the assumption that the model was NOT finetuned with a system header & null system prompt ie. No no , you are correct, the better result is if it was trained with system headers and later inferenced with the system headers present too , regardless of null system message. The second question I mean and the question is for the official Meta instruct model: [https Should the system headers be present or not, regardless of null system prompt?",2024-05-15T17:22:07Z,Sneakr, 2289665071,2123071504,"Just leaving this in here Edit: After lifting a different issue with PHI missing the system tokens in the tokenizer config they removed the system tokens in the fine tuning script due to not being supported by the model. However, this is not the case for Llama3 instruct, as the system token seems to be supported by the model. ",2024-05-21T17:07:03Z,Sneakr, 2289665071,2156772634," Not sure why this was marked as completed, the issue has not been resolved or answered at all.",2024-06-09T20:23:12Z,Sneakr, 2286972601,2102103954,"asked for another url through this link and then download through new URL link provided by Meta",2024-05-09T07:22:51Z,RabiaSamad, 2286972601,2102963436,"> asked for another url through this link > > and then download through new URL link provided by Meta Yes, I did. Still same issue. However I was able to access via HuggingFace, after authenticating hugging face token.",2024-05-09T16:06:03Z,deepakdhiman7, 2286972601,2113520387,"Same issue, I get a couple of followed by a . ",2024-05-15T21:57:32Z,PladsElsker, 2286972601,2124312205,"I encountered a similar problem because my download link was more than 24 hours old. Just regenerate it. Remember that the links expire after 24 hours and a certain amount of downloads. You can always re-request a link if you start seeing errors such as . ",2024-05-22T09:31:19Z,victorwrage, 2286972601,2124805790,"My link was around 5 minutes to 10 minutes old, and I tried with 3 different links that were freshly generated. All of these fresh links gave me the same errors. ",2024-05-22T13:30:26Z,PladsElsker, 2286972601,2126240877,"I download success after using proxy, try to set your own proxy address, examples are as bellow. On Mac: then try wget to download again. On Windows: write a script by python requests: you can find url in read it and compose your own model url download address or just use echo to check. A python version of download with proxy can be view at ",2024-05-23T05:05:31Z,YuleZhang, 2286069869,2101074963,"Hi, could you provide more information on what you are trying to run it on? Also, please try reinstalling PyTorch and try running it again. You can do it from here : ",2024-05-08T17:34:05Z,fbnav, 2286069869,2103830390,"I got the same error. My OS is windows 11. Here is what I got with ",2024-05-10T04:16:38Z,tungts1101, 2286069869,2103860554,It seems like Windows doesn't support NCCL backend. Does it mean that I can only run on linux based machine?,2024-05-10T04:52:45Z,tungts1101, 2286069869,2103928131,"I have tried again with my Ubuntu 22.04 installed under WSL. The error has disappeared but I still get this error when trying to run the example ",2024-05-10T06:03:41Z,tungts1101, 2286069869,2104892767,"Could you please provide the complete error message and your hardware specs, along with the code you tried to run? NCCL isn't supported on Windows. If you are running on Windows, can you please check here and use and try if that works?",2024-05-10T16:26:12Z,fbnav, 2286069869,2106208054,"> Could you please provide the complete error message and your hardware specs, along with the code you tried to run? > > NCCL isn't supported on Windows. If you are running on Windows, can you please check here and use and try if that works? Above is the complete error message when I try to run the example in README file. The OS is Ubuntu 22.04 with Intel core i5, RTX 3050 Laptop GPU.",2024-05-12T11:03:24Z,tungts1101, 2286069869,2107362417,I think the root cause is the hardware doesn't meet the minimum requirement to run the llama-7B model.,2024-05-13T11:46:50Z,tungts1101, 2286069869,2108130888,Yes it might be that. You will need a min VRAM of ~16GB to run the 8B model in fp16 precision.,2024-05-13T16:17:19Z,fbnav, 2286069869,2123033369,Closing this issue. Feel free to re-open if the issue persists.,2024-05-21T16:44:33Z,fbnav, 2285955516,2101873889,Basically I have completed the whole prompt engineering stuff with llama3 which can be used to create your own chat model inside your own pc or computer. I could upload how to do it by a jupyter notebook or a markdown file to help new learners to make their own chat model with the help of the llama! Should I create a pull request for notebook or readme file? Reply me,2024-05-09T03:38:28Z,Gitstar-OC, 2285955516,2103188287,"I would love to look at your code. I've been working on understanding why I can't load my vanilla Llama 3 model straight into Langchain and why would I need to convert it and quantize it. I'm trying learn this by myself and it's quite slow because there is not much out there. So cheers if you can do a pull request, I would certainly try it out. There are many ways you can use ollama and huggingface but nothing about creating your own API... Patrick Miron",2024-05-09T18:19:50Z,DragonAngel1st, 2285955516,2103761665,"Great I will make a pull request today, closing this issue. Thanks for your reply!",2024-05-10T02:47:27Z,Gitstar-OC, 2285447017,2109236445,Can you get specify which model you are using? We didn't release a quantized model as part of our release so we may not be able to support you here. :(,2024-05-14T04:02:09Z,jspisak, 2284788236,2101078347,"Hi, for chat use case, please use the Llama Instruct model. Here's a link to the model : ",2024-05-08T17:36:08Z,fbnav, 2284453013,2101088009,"Hi, it looks like you are running this on Windows. NCCL isn't supported on Windows. Can you please check here and use and try if it works?",2024-05-08T17:42:34Z,fbnav, 2284453013,2103180495,"I have the same problem. So I tried with replacing torch.distributed.init_process_group(""nccl"") with torch.distributed.init_process_group(backend='gloo'). However it incurs the same error.",2024-05-09T18:14:25Z,jhyangkorea, 2284453013,2119253861,"Hey, I have the same issue running on Windows. After replacing the nccl for gloo I get the following: Is there a solution to running the model on windows already?",2024-05-19T14:16:46Z,Endote, 2284453013,2123133959,"Hi, this might be because you may be using a version of PyTorch that is not compatible with your CUDA version. Could you try to try upgrading to a newer version of PyTorch that supports the   attribute? Or you can try reinstalling PyTorch and running it again. You can do it from here : [https Also linking a similar issue where replacing the backend with worked on Windows OS for reference : [https For this repo specifically, the example scripts are for running inference on single (for 8B) and multi (for 70B) GPU setups using CUDA, but Windows is not currently supported. Feel free to check out the examples on the llama-recipes repo for running Llama locally via hugging face or ollama: [https ",2024-05-21T17:46:06Z,fbnav, 2283776829,2113903671," I encountered this issue while attempting to download. I then proceeded to download all the files, and now the pretrained models are downloading. Try downloading all the files ones.",2024-05-16T02:34:33Z,Arwindhraj, 2283776829,2146270842," I see ""Connecting to 127.0.0.1:1080"" instead of download<>.llamameta.net. Is there a proxy that you are using?",2024-06-03T23:05:04Z,samuelselvan, 2283776829,2187310333,Please feel to re-open if needed. Thanks.,2024-06-24T20:03:01Z,samuelselvan, 2283579310,2098604538,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-05-07T14:55:12Z,facebook-github-bot, 2283579310,2098637296,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-05-07T15:04:29Z,facebook-github-bot, 2282875785,2098001432,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-05-07T10:25:41Z,facebook-github-bot, 2282504038,2097629950,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-05-07T07:27:32Z,facebook-github-bot, 2282504038,2097700257,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-05-07T08:07:48Z,facebook-github-bot, 2282504038,2110560008,"Thanks - although this isn't a critical change, it can help improve readability. The correct token is , if you can update the PR i'll merge it",2024-05-14T15:40:20Z,subramen, 2282504038,2110765875, ah that's right! :) updated now,2024-05-14T17:29:25Z,antonioramos1, 2281514145,2118248457,cc and This looks like an issue with model access on HF. Any suggestions on how they can download all the their request?,2024-05-17T19:29:32Z,fbnav, 2279054973,2095173665,"Why tensorflow seen in the log, should use PyTorch instead? 2024-05-06 11 03.478993: I This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.",2024-05-06T04:20:30Z,SidneyLann, 2279020765,2098831199,"Is it not possible currently to run llama3 locally on mac? If so, is there a way to use llama3 on Macbooks vscode?",2024-05-07T16:15:15Z,thestud1, 2279020765,2098850985,"> Is it not possible currently to run llama3 locally on mac? Not the native models released by Meta, only the HF models with a 3rd party tool like ",2024-05-07T16:26:07Z,ashwini, 2279020765,2098899342,"thanks, but that is kinda sadge. Spent 2h today to make it work unluuuuucky. ",2024-05-07T16:54:10Z,thestud1, 2279020765,2105964870,"Does not run on mac, my computer is macOs, CPU is M2, boot prompt Change the nccl in to Tip: line 399, in set_device [rank0]: torch._C._cuda_setDevice(device) [rank0]: AttributeError: module 'torch._C' has no attribute '_cuda_setDevice' E0512 01 16.362418 7939186368 failed (exitcode: 1) local_rank: 0 (pid: 5719) of binary: ",2024-05-11T17:36:41Z,p19971018, 2279020765,2116333755, how can we make this a feature request?,2024-05-16T22:59:04Z,ashwini, 2279020765,2118095995,"More information, like the dependencies required for OS (not Mac), would be nice in the readme.",2024-05-17T17:35:56Z,VelizarVESSELINOV, 2279020765,2154250286,Would a contribution for this be welcome or is it a non-priority? I can start building on top and maybe we can get somewhere?,2024-06-07T07:24:39Z,hknlof, 2279020765,2227061616,"> Does not run on mac, my computer is macOs, CPU is M2, boot prompt Change the nccl in to Tip: I'm with you! I tried using the 'gloo' backend for distributed initialization, avoiding issues related to 'nccl' and missing environment variables, but that didn't work...",2024-07-13T19:34:16Z,Atreyu4EVR, 2279020765,2381096533,Try this one and feedbacks are welcome.,2024-09-29T04:00:17Z,kuizhiqing, 2279010700,2113030314,"Hi, I'm not sure what your question is. Can you share minimal code snippets so we can better understand your query?",2024-05-15T16:55:50Z,subramen, 2278811759,2094296305," ",2024-05-04T16:44:08Z,CrossPr0duct, 2278811759,2101020047,"Both are fine, in the first one you're letting the LLM determine what the first output token should be, whereas in the second one you are enforcing the first output token to be a newline and have the LLM complete it from there",2024-05-08T17:00:57Z,subramen, 2278012518,2187322604, do you mind submitting another request? ,2024-06-24T20:11:29Z,samuelselvan, 2278012518,2218795411,Please re-open if still needed.,2024-07-09T21:53:25Z,samuelselvan, 2276696674,2105964112,What is the best way to adapt the 8 checkpoints for for the 70B model to say 16 A100-40GB ? ,2024-05-11T17:33:08Z,whatdhack, 2276696674,2113031656,Please see this thread: ,2024-05-15T16:56:39Z,subramen, 2276696674,2237984402," ",2024-07-19T02:58:03Z,dirtycomputer, 2276696674,2284391468," , looks like there are more fundamental issues in adapting the 8 GPU checkpoint to any number higher than 8 . See the following. ` self.n_kv_heads = args.n_heads if args.n_kv_heads is None else args.n_kv_heads model_parallel_size = fs_init.get_model_parallel_world_size() self.n_local_heads = args.n_heads model_parallel_size self.n_local_kv_heads = self.n_kv_heads model_parallel_size self.n_rep = self.n_local_heads self.n_local_kv_heads self.head_dim = args.dim args.n_heads ` ",2024-08-12T16:16:00Z,whatdhack, 2274555093,2099615931,"Hello, I am using your code and according to the requirements of llama3, the error that cannot be run is,May I ask if there is an issue with my environmental dependency? How should I modify it? I am using i513500H CPU and running in the wsl2 environment. > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 ERROR failed (exitcode: -9) local_rank: 0 (pid: 208310) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) ^^^^^^ File line 346, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ======================================================= example_chat_completion.py FAILED ------------------------------------------------------- Failures: ------------------------------------------------------- Root Cause (first observed failure): [0]: time : 2024-05-08_10 24 host : chenqingyuICer. rank : 0 (local_rank: 0) exitcode : -9 (pid: 208310) error_file: traceback : Signal 9 (SIGKILL) received by PID 208310 =======================================================",2024-05-08T02:26:39Z,13230668653, 2274555093,2099694157,"> Hello, I am using your code and according to the requirements of llama3, the error that cannot be run is,May I ask if there is an issue with my environmental dependency? How should I modify it? I am using i513500H CPU and running in the wsl2 environment. > > > initializing model parallel with size 1 > > initializing ddp with size 1 > > initializing pipeline with size 1 > > ERROR failed (exitcode: -9) local_rank: 0 (pid: 208310) of binary: > > Traceback (most recent call last): > > File line 8, in > > sys.exit(main()) > > ^^^^^^ > > File line 346, in wrapper > > return f(*args, **kwargs) > > ^^^^^^^^^^^^^^^^^^ > > File line 794, in main > > run(args) > > File line 785, in run > > elastic_launch( > > File line 134, in **call** > > return launch_agent(self._config, self._entrypoint, list(args)) > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > File line 250, in launch_agent > > raise ChildFailedError( > > torch.distributed.elastic.multiprocessing.errors.ChildFailedError: > > ======================================================= > > example_chat_completion.py FAILED > > ## Failures: > > # Root Cause (first observed failure): > [0]: > time : 2024-05-08_10 24 > host : chenqingyuICer. > rank : 0 (local_rank: 0) > exitcode : -9 (pid: 208310) > error_file: > traceback : Signal 9 (SIGKILL) received by PID 208310 It turns out that the memory is exploding. What is the size of your computer's memory? I allocated 28GB, but it still doesn't work.",2024-05-08T04:00:10Z,13230668653, 2273571180,2110818919,"Looks like you're using the quantized models, it might be hampering the model's performance on numerical data. I cannot replicate this issue on the official meta llama models, I get 11110 from both 8b and 70b models. Try increasing the temperature, 0.01 sounds quite low.",2024-05-14T18:03:32Z,subramen, 2273398782,2088371094,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-05-01T12:04:10Z,facebook-github-bot, 2273398782,2088434623,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-05-01T13:04:33Z,facebook-github-bot, 2273160859,2088694435,ssh can work!,2024-05-01T16:10:13Z,clean-e2map, 2270569636,2087595600,"Título: “Relatividad” Imágenes de un reloj antiguo, ecuaciones de Einstein en un pizarrón, un tren en movimiento (para representar la dilatación del tiempo), un agujero de gusano, la paradoja del abuelo representada artísticamente, una máquina del tiempo, y representaciones del futuro, pasado y presente. ",2024-04-30T22:29:12Z,salomeai, 2269692493,2083390792,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-04-29T18:30:34Z,facebook-github-bot, 2269692493,2085212984,Blox fruit is a roblox game. This PR doesn't make sense. Please Close.,2024-04-30T12:34:38Z,shelwinsunga, 2269078071,2109237648,"thanks - given this is legal approved language, we are going to keep it as is. Appreciate the PR..",2024-05-14T04:03:48Z,jspisak, 2267499268,2088772459,This is not an API Meta Llama offers. Please reach out to for this issue,2024-05-01T17:03:00Z,subramen, 2267438677,2085035105,same,2024-04-30T11:23:22Z,Xtj1999, 2267438677,2088661014,"是他妈了个逼的长城防火墙的问题。 他妈了个逼的,还得让老子花钱买VPN,狗日的玩意。",2024-05-01T15:47:16Z,missfmaster, 2267438677,2105589010,Download successfully with VPN,2024-05-11T06:17:49Z,wqw547243068, 2267438677,2240899491,"> 是他妈了个逼的长城防火墙的问题。 > > ` > 正在连接 127.0.0.1:1087... 已连接。 > 已发出 Proxy 请求,正在等待回应... 200 OK > 长度:16060617592 (15G) > 正在保存至: > > 0%[ ] 25.49M 剩余 6h 13m > ` > > 他妈了个逼的,还得让老子花钱买VPN,狗日的玩意。 唉 真是他妈个比的一言难尽 ",2024-07-20T03:50:40Z,MacJayLee, 2267438677,2252044590,"> 是他妈了个逼的长城防火墙的问题。 > > ` > 正在连接 127.0.0.1:1087... 已连接。 > 已发出 Proxy 请求,正在等待回应... 200 OK > 长度:16060617592 (15G) > 正在保存至: > > 0%[ ] 25.49M 剩余 6h 13m > ` > > 他妈了个逼的,还得让老子花钱买VPN,狗日的玩意。 有 VPN 了也不行呢?头大",2024-07-26T06:11:19Z,walkingleo, 2267438677,2288011784,想问下,是vpn模式不对吗,我也有vpn,但是也不行,还是报403,2024-08-14T07:10:05Z,jingxdy, 2267438677,2290455054,VPN好像需要开全局,然后IP到美国,2024-08-15T03:02:47Z,Xtj1999, 2267438677,2311877764,"> 是他妈了个逼的长城防火墙的问题。 > > ` > 正在连接 127.0.0.1:1087... 已连接。 > 已发出 Proxy 请求,正在等待回应... 200 OK > 长度:16060617592 (15G) > 正在保存至: > > 0%[ ] 25.49M 剩余 6h 13m > ` > > 他妈了个逼的,还得让老子花钱买VPN,狗日的玩意。 hhhhhhhhh最后解决了么 ",2024-08-27T08:24:58Z,CodeDuoGun, 2267438677,2311878960," me too",2024-08-27T08:25:27Z,CodeDuoGun, 2267438677,2416933456,True...... VPN is all you need . what can i say,2024-10-16T13:59:22Z,Stephentting, 2267407693,2081374552,"yes, I think they use ""Horizontal Model Sharding"" ",2024-04-28T07:40:34Z,Lynn1, 2267407693,2088658140,"Yes, running the 70B needs 8 GPUs as it has 8 shards. You can run it on a different number of GPUs via huggingface.",2024-05-01T15:45:12Z,subramen, 2267407693,2097910926,"> Yes, running the 70B needs 8 GPUs as it has 8 shards. You can run it on a different number of GPUs via huggingface. How to run it on a different number of GPUs via huggingface? ",2024-05-07T09:55:34Z,xiaoToby, 2267407693,2141278233,I wonder about this. I would appreciate it if you could answer this.,2024-05-31T05:51:09Z,Genie-Kim, 2267407693,2150522420, ,2024-06-05T16:50:41Z,subramen, 2267392300,2088776581,"Hi, you could try using to speed up inference with CUDA graphs. We have some examples using VLLM here: ",2024-05-01T17:06:03Z,subramen, 2267118357,2085864481,"Hi! Since your question is related to ollama, please ask this on the ollama repository. This repo is only related to the llama3 model inference.",2024-04-30T16:24:10Z,subramen, 2266959435,2081437538,"Same as me Why is that? I am zhangzhao219 ",2024-04-28T11:11:09Z,zhangzhao219, 2266959435,2084661961,"same My: aolong",2024-04-30T08:07:57Z,li-aolong, 2266959435,2094569934,"same letsgo-2",2024-05-05T03:45:58Z,letsgo-2, 2266959435,2105690526,Same for me: evilfreelancer,2024-05-11T11:47:48Z,EvilFreelancer, 2266959435,2127218952,"same for: THXTHX please",2024-05-23T14:07:01Z,TangHengxuan, 2266959435,2131306681,"Same for me: SKFE Is there a way to submit another request?",2024-05-25T15:16:49Z,SKFE396, 2266959435,2151494601,"Same for me: wangersjtu Is there a way to submit another request?",2024-06-06T06:20:08Z,Wanger-SJTU, 2266959435,2190413068,"Same for me: Henry65 Please!!!",2024-06-26T02:21:12Z,HenryStephen, 2266959435,2249817971,"Same for me: chtsy Please!!!",2024-07-25T08:55:08Z,chtsy, 2266959435,2268061731,"Same for me: canux Please !!!",2024-08-05T02:36:44Z,crazy-canux, 2266959435,2274857467,"same for me, my hf user name is: JerryJH, thank u very much.",2024-08-08T02:56:10Z,Killerofthecard, 2266959435,2283406598,"same, my hf user name is: Niuda0931 thank you very much.",2024-08-12T08:42:07Z,Sooner0931, 2266959435,2314482035,"same, my hf user name is: AprilZhang2024 Please !!! thank you very much.",2024-08-28T07:04:03Z,AprilYapingZhang, 2266959435,2345139805,"same for me. my user name is dshm Is there a way to submit another request? thank you very much",2024-09-12T02:27:49Z,dshm, 2266959435,2451557649,"same for me. my user name is yuzhang2024. thank you!!!",2024-11-01T09:08:32Z,yuzhang-cs, 2266959435,2484708713,"same for me. my user name is han1n thank you very much",2024-11-19T04:56:24Z,hang-1n-there, 2266959435,2594424190,"same, my hf user name is xp123 Please !!! thank you very much.",2025-01-16T03:54:25Z,Essence9999, 2266634196,2080417624,I have the same issue.,2024-04-27T08:36:49Z,edzq, 2266634196,2093958458,I have the same issue,2024-05-04T02:07:46Z,Itime-ren, 2266634196,2105700129," same issue. what more information would you need ? cause the current state for me is: 1. I have pytorch and cuda 12.1 installed 2. model 70B-Instrcut downloaded in the correct directory as the example repo 3. run the torchrun command as specified and getting the above error. i also changed the backend from nccl to gloo to account for the warnings that were appearing, maybe that has something to do with it ?",2024-05-11T11:58:40Z,nightsSeeker, 2266634196,2110455095,"How many GPUs are you using? the 70B model will need 8GPUs to run from this repo. If you have less than 8 GPUs, please use the model from HF",2024-05-14T14:53:19Z,subramen, 2266634196,2110497041,"my solution for reference: ",2024-05-14T15:11:30Z,Lynn1, 2266634196,2157142589,Can 8 gpus be on multiple nodes?,2024-06-10T03:42:05Z,trung6, 2265908559,2081540820,"Hi, I am also trying to output the attention weight of llama3. Have you tried output Llama3 attention weight with itself? (for example, outputs = model.generate(tokens, output_attentions=True))",2024-04-28T16:26:06Z,Icamd, 2265908559,2081684401,"> Hi, I am also trying to output the attention weight of llama3. Have you tried output Llama3 attention weight with itself? (for example, outputs = model.generate(tokens, output_attentions=True)) Hi I think you're using the Huggingface version? I have tried using the same thing you have, but the attention weights I get are of a strange shape. Usually, attention weights have the shape (batch_size, num_heads, seq_length, seq_length), but in Huggingface Llama's case, I get a mismatch in the batch_size axis. It is my guess that since output_attentions is not actually a parameter in the model architecture shown in this repo, Huggingface does something internally to calculate the attention weights, and thus provides wrong values. I could be wrong, of course. I also get some warnings whenever I have tried to do this with Huggingface. That's why I am using the PyTorch version of this model instead.",2024-04-28T22:43:06Z,bear96, 2265908559,2082293794,"> > Hi, I am also trying to output the attention weight of llama3. Have you tried output Llama3 attention weight with itself? (for example, outputs = model.generate(tokens, output_attentions=True)) > > Hi I think you're using the Huggingface version? I have tried using the same thing you have, but the attention weights I get are of a strange shape. Usually, attention weights have the shape (batch_size, num_heads, seq_length, seq_length), but in Huggingface Llama's case, I get a mismatch in the batch_size axis. It is my guess that since output_attentions is not actually a parameter in the model architecture shown in this repo, Huggingface does something internally to calculate the attention weights, and thus provides wrong values. I could be wrong, of course. I also get some warnings whenever I have tried to do this with Huggingface. That's why I am using the PyTorch version of this model instead. I thinks Huggingface version's attention weight has the shape of (outputs_token_numbe, layers, batch_size, heads, input_token_number, input_token_number), for example (50, 32, 1, 32, 251, 251), but I am not sure. I am still trying to visualize the attention between tokens to find out if there is any connection. However I have strange model outputs using in google colab :( ",2024-04-29T09:43:18Z,Icamd, 2265908559,2082940331," Hi! I find this paper ""Analyzing the Structure of Attention in a Transformer Language Model"" mentioned something called "" Null Attention"", which said ""attention focused on the first token"". Maybe you can try to mask the first token's attention so it won't influence the overall attention weight?(I'm not sure)",2024-04-29T14:45:36Z,Icamd, 2265908559,2083115523,"> Hi! I find this paper ""Analyzing the Structure of Attention in a Transformer Language Model"" mentioned something called "" Null Attention"", which said ""attention focused on the first token"". Maybe you can try to mask the first token's attention so it won't influence the overall attention weight?(I'm not sure) I'll definitely check that out! Thanks! ",2024-04-29T16:01:22Z,bear96, 2265908559,2093287940,"I believe I have solved the issue. I was taking an average across all 32 heads and then applying a softmax function to get them to appear as probabilities, but that caused a lot of minute changes in the attention weights to disappear, leaving an almost uniform distribution of weights. I'm trying to visualize the attention weights with respect to individual heads instead. Due to Null Attention as cited by the first token has extremely high attention weights, whereas the rest of the weights vary in an exponential way, so I am having to take the log of these weights instead for better visualizations. I am not sure why Null Attention occurs however. If someone knows more about this, please let me know!",2024-05-03T15:55:46Z,bear96, 2265825470,2079643511,"If you specify the ""format"" and set it to ""json"" you will have your desired results.",2024-04-26T15:42:43Z,aqib-mirza, 2265825470,2081104018,"llama3 8b instruct model, how to use this format params, can you share? Need a example or prompt related documentation.",2024-04-27T17:14:28Z,Dineshkumar-Anandan-ZS0367, 2265825470,2081149341,"Here is an example code """"""model_id = pipeline = transformers.pipeline( ""text-generation"", model=model_id, model_kwargs={""torch_dtype"": torch.float16}, device=""cuda"", token = ""HF-Token"" ) messages = [ {""role"": ""system"", ""content"": ""You are a pirate chatbot who always responds in pirate speak! and return every answer in JSON format""}, {""role"": ""user"", ""content"": ""Who are you?""}, ] prompt = pipeline.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, format = ""JSON"" ) terminators = [ pipeline.tokenizer.eos_token_id, pipeline.tokenizer.convert_tokens_to_ids(""<|eot_id|>"") ] outputs = pipeline( prompt, max_new_tokens=256, eos_token_id=terminators, do_sample=True, temperature=0.6, top_p=0.9, ) print(outputs[0][""generated_text""][len(prompt):])""""""",2024-04-27T18:55:21Z,aqib-mirza, 2265825470,2081155829,Thanks a ton sir! I will check this.,2024-04-27T19:13:59Z,Dineshkumar-Anandan-ZS0367, 2265825470,2081188532,"Same prompt and same ocr text from image. Each request the llm gives different results, how can I maintain the results. Is there any options for this, I understand this is a llm. Can you suggest some ideas for prompt to extract key value pairs in a paragraph.",2024-04-27T21:34:10Z,Dineshkumar-Anandan-ZS0367, 2265825470,2081880842,"Getting same result as before inspite of using prompt = pipeline.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, format = ""JSON"" ) ",2024-04-29T04:44:19Z,Dineshkumar-Anandan-ZS0367, 2265825470,2243230375,Having the same problem. Any update on this? Or any prompt hint?,2024-07-22T15:24:13Z,LDelPinoNT, 2265825470,2243489254,"> Having the same problem. Any update on this? Or any prompt hint? You need to explicitly mention you JSON Structure in the prompt. Its the only way to get expected JSON format. If you have got any other tokes in output, add post process logic inside your code.",2024-07-22T17:44:04Z,Dineshkumar-Anandan-ZS0367, 2265825470,2257724153,"you can try lower the temperature hyperparameters > Same prompt and same ocr text from image. Each request the llm gives different results, how can I maintain the results. > > Is there any options for this, I understand this is a llm. > > Can you suggest some ideas for prompt to extract key value pairs in a paragraph. ",2024-07-30T08:03:16Z,YanJiaHuan, 2265825470,2257874433,Thanks a lot for the response William,2024-07-30T09:16:04Z,Dineshkumar-Anandan-ZS0367, 2265310492,2078975511,"OK it looks like it had reverted to CPU installation of PyTorch somehow, attempting GPU installation again,",2024-04-26T09:13:12Z,AdamMiltonBarker, 2265310492,2080242443,This was resolved by reinstalling PyTorch,2024-04-26T23:51:21Z,AdamMiltonBarker, 2265310492,2200071021,Do you have a GPU to run it? What if I run it on a CPU-only machine? I install the pytorch CPU version but have the same error with you. How should I deal with it? Thanks a lot!,2024-07-01T12:55:15Z,rain7996, 2265156271,2093669988,"Hi, thank you for your work. May I ask what version of transformer you have and how you load the checkpoint? Mine seems to keep reporting torch shape error for checkpoint when using CPU because of GQA.",2024-05-03T19:51:11Z,Papapapapapaya, 2265156271,2094759928,hmm... I just did what it said in the readme.md file. I just downloaded by pip.,2024-05-05T11:04:11Z,HaShaWB, 2265156271,2096834921,"> hmm... I just did what it said in the readme.md file. I just downloaded Could you please confirm which version you have successfully run, 8B or 70B or both?",2024-05-06T20:18:52Z,Papapapapapaya, 2265156271,2109124129,"I've gotten both 8B and 70B (non-chat) running on a CPU. This will _probably_ work for the chat models, but I haven't checked those. You will need at least ~64GB of RAM to run 8B on a CPU, and at least ~320GB of RAM to run 70B, with and set to relatively small values. Below is the code to load the model and tokenizer, adapted from There is a small but crucial difference from tloen's code in what's below. `{python} import json from pathlib import Path import llama tokenizer_path = '...' # replace with your local path to tokenizer.model ckpt_dir = '...' # replace with your local path to the directory containing the model max_seq_len = 4 # replace with whatever max seq len you want max_batch_size = 1 # replace with whatever max batch size you want tokenizer = llama.Tokenizer(model_path=tokenizer_path) checkpoints = sorted(Path(ckpt_dir).glob('*.pth')) with 'r') as f: params = json.loads(f.read()) model_args = llama.ModelArgs( max_seq_len=max_seq_len, max_batch_size=max_batch_size, **params ) model_args.vocab_size = tokenizer.n_words model = llama.Transformer(model_args) # Original copyright by tloen # key_to_dim = { ""w1"": 0, ""w2"": -1, ""w3"": 0, ""wo"": -1, ""wq"": 0, ""wk"": 0, ""wv"": 0, ""output"": 0, ""tok_embeddings"": 0, # This MUST be 0 for Llama 3, unlike LLaMA or Llama 2, which use -1 ""ffn_norm"": None, ""attention_norm"": None, ""norm"": None, ""rope"": None, } for i, ckpt in enumerate(checkpoints): checkpoint = torch.load(ckpt, map_location='cpu') for parameter_name, parameter in model.named_parameters(): short_name = parameter_name.split(""."")[-2] if key_to_dim[short_name] is None and i == 0: parameter.data = checkpoint[parameter_name] elif key_to_dim[short_name] == 0: size = checkpoint[parameter_name].size(0) parameter.data[size * i: size * (i + 1), :] = checkpoint[ parameter_name ] elif key_to_dim[short_name] == -1: size = checkpoint[parameter_name].size(-1) parameter.data[:, size * i: size * (i + 1)] = checkpoint[ parameter_name ] del checkpoint[parameter_name] del checkpoint model.to('cpu') generator = llama.Llama(model, tokenizer) ` is now your (non-Hugging Face) Llama 3 model!",2024-05-14T01:47:47Z,mawilson1234, 2263939327,2077665023,I mean transformers.PretrainedTokenizer class,2024-04-25T16:12:23Z,tian969, 2263939327,2078650940,same question,2024-04-26T05:19:56Z,ppaanngggg, 2263939327,2081699087,"I find the solution, you should use model files on huggingface. There is a tokenizer.json file can be loaded directly.",2024-04-28T23:35:18Z,ppaanngggg, 2263939327,2088820411,"Yes, you can use ",2024-05-01T17:38:28Z,subramen, 2262621161,2088673226,"Hi, thanks for submitting your PR. Please take a look at which already contains many examples of using llama 3 from cloud providers and finetuning.",2024-05-01T15:55:17Z,subramen, 2262621161,2089273133,"> Hi, thanks for submitting your PR. Please take a look at which already contains many examples of using llama 3 from cloud providers and finetuning. Okay, thanks. But for those who are new to LLAMA3, here's a document I put together for you. It's a shame it wasn't uploaded because it has a lot of useful information.",2024-05-01T23:12:58Z,jh941213, 2262494274,2079258793,"I've faced the same issue, and I found out that adding at the end of the command could solve the problem. I hope it helps you.",2024-04-26T12:06:54Z,ailuropodaWu, 2262446576,2076182636,"+1, I have spent 6 days",2024-04-25T01:52:19Z,for-just-we, 2261554713,2075538658,"Hi, could you please provide more information on the issue - are you facing this when you run the download.sh file, are you using WSL or windows cmd line, what's your hardware specifications? Please note that the example scripts in this repo are for running inference on single (for 8B) and multi (for 70B) GPU setups using CUDA, but Windows is not currently supported. You might want also like to check out these examples for running Llama locally without distributed via hugging face or ollama ",2024-04-24T18:06:11Z,fbnav, 2261554713,2106161247," hello ,i have input the URL,but show this issue",2024-05-12T08:13:39Z,fgd0707, 2261554713,2106173327," now ,i can not download the models, because 403",2024-05-12T09:00:07Z,fgd0707, 2261554713,2116212588,"Adding a **proxy** to your order from other districts will fix your problem. > now ,i can not download the models, because 403 ",2024-05-16T21:22:20Z,Howe-Ren, 2261515080,2075464622,"Hey thanks for reporting this issue. I'm assuming that you're issue report is about the special tokens being in the output? I think that there a couple things for you to reference to help with this, first, I'd recommend checking out the prompt template and special tokens documentation that we have here on llama-recipes You'll also note in the following llama-recipes some examples of ollama usage.",2024-04-24T17:23:42Z,ejsd1989, 2261477957,2075122671,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-04-24T14:44:22Z,facebook-github-bot, 2260179063,2090811660,same,2024-05-02T15:29:11Z,ERmak148, 2260179063,2146254017,Can you share your hf user ids?,2024-06-03T22:48:36Z,samuelselvan, 2260179063,2187313777, please feel free to re-open with with your hf user id if still facing issues.,2024-06-24T20:05:21Z,samuelselvan, 2260179063,2190409922,I had the same question. My hf id is Henry65. Please,2024-06-26T02:19:42Z,HenryStephen, 2260179063,2266713411," I had the same question. My hf id is ZhenbinChan.",2024-08-03T13:30:58Z,BiNLP, 2260048325,2110564954,"The change helps improve readability, lgtm",2024-05-14T15:42:52Z,subramen, 2260048325,2195887784,Any chance of this ? 🙏 ,2024-06-28T00:46:39Z,pchng, 2260048325,2206801636,Thanks for your contribution @pchng!,2024-07-03T17:00:11Z,subramen, 2259605000,2077945734,Have you found a fix?,2024-04-25T18:44:48Z,yohlimem, 2259605000,2078923132,No. I also tried on huggingface and same issue. Also for llama2 same issue,2024-04-26T08:43:35Z,ryzeto, 2259605000,2146264893, are you still facing issues?,2024-06-03T22:59:18Z,samuelselvan, 2259605000,2187312462,Please free to re-open if still facing issues.,2024-06-24T20:04:29Z,samuelselvan, 2259498373,2078544548,"same, how do you resolve it?",2024-04-26T02:54:00Z,AWangji, 2259061145,2075463982, the support for llama3 is only added to the latest transformers version can you pls upgrade to ,2024-04-24T17:23:18Z,HamidShojanazeri, 2259061145,2075582868," yes our team realized that and I updated to 4.40.0 last night, still had same issue, is 4.40.1 necessary vs 4.40.0 ?",2024-04-24T18:33:11Z,yangyangyyy123, 2258929364,2072364761,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-04-23T13:45:54Z,facebook-github-bot, 2258912709,2072348866,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-04-23T13:38:27Z,facebook-github-bot, 2258912709,2072414317,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-04-23T14:07:45Z,facebook-github-bot, 2258912709,2110572082,Thanks I think it might be better to add the asserts in the function instead of in the example scripts. ,2024-05-14T15:46:22Z,subramen, 2258912709,2122690596," Thanks for the suggestion! Thats better, I've moved the assertions into the build function of the Llama class 😊 ",2024-05-21T13:52:13Z,aakashapoorv, 2258882543,2072319539,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-04-23T13:25:22Z,facebook-github-bot, 2258882543,2072323070,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-04-23T13:26:54Z,facebook-github-bot, 2258617279,2073940621,"I found that LLaMA-3 tokenizer use Latin-1 encoder. Weird...! Why not just for all simplification. Any hint? Steve",2024-04-24T03:23:13Z,thusinh1969, 2258617279,2075469395,"Llama 3 is using tiktoken tokenizer now, can you pls check the discussion here, ",2024-04-24T17:26:47Z,HamidShojanazeri, 2258617279,2358806568," are the reserved special tokens such as ""<|reserved_special_token_246|>"" replaceable? I want to add some additional special tokens.",2024-09-18T15:36:09Z,Pranil51, 2258302237,2073764061,Maybe supporting configurable back ends for torch.distributed is an option? ,2024-04-24T00:58:32Z,ccozad, 2258302237,2075060228,"Hi! The example scripts in this repo are for running inference on single (for 8B) and multi (for 70B) GPU setups using CUDA, but Windows is not currently supported. You might want to check out these examples for running Llama locally without distributed via hugging face or ollama ",2024-04-24T14:16:27Z,subramen, 2258302237,2075422741," Thank you for the confirmation. I setup a linux machine on AWS and got things to run. I put together a guide here: Perhaps in the future Microsoft, Nvidia and other vendors will open more options to put gaming computers to good use.",2024-04-24T16:57:55Z,ccozad, 2258302237,2075819620," See my comment on #127 , I was able to get the model to build on Windows by initializing the backend before calling Llama.build()",2024-04-24T20:52:08Z,ccozad, 2258153663,2071729257,"Working on this issue, inside llama.cpp's guts rn. Will prolly figure something out. In the meantime, any help would be appreciated.",2024-04-23T08:32:31Z,Codedestructor56, 2258153663,2071980500,"The issue: I wrote ""model"" instead of ""models"", sorry about that :)",2024-04-23T10:43:36Z,Codedestructor56, 2258115199,2073708558,啊啊啊啊啊啊怎么解决啊,+1,孩子要疯掉了,2024-04-24T00:29:52Z,Michael4933, 2258115199,2073876339,llama3 8B 即使用了GQA,然而在llama2中7B和13B是没有使用GQA的,只有70B才使用了GQA,注意修改默认的n_kv_heads和n_heads即可。,2024-04-24T02:19:24Z,Xu107, 2258115199,2073972091," > llama3 8B 即使用了GQA,然而在llama2中7B和13B是没有使用GQA的,只有70B才使用了GQA,注意修改默认的n_kv_heads和n_heads即可。 n_kv_heads和n_heads改成什么,能告知一下吗。谢谢了",2024-04-24T03:50:45Z,jidandan666, 2258115199,2074815240,"have you tried different n_head? I saw n_head of llama3-70b is 64, but I haven’t tried this.",2024-04-24T12:19:05Z,suyicon, 2258115199,2074817528,"> have you tried different n_head? I saw n_head of llama3-70b is 64, but I haven’t tried this. BTW, when I load llama3-70b, there is no error.",2024-04-24T12:20:19Z,suyicon, 2258115199,2074964336," ""num_hidden_layers"": 32, ""num_key_value_heads"": 8, 是不是把config.py里面的这两个改成一样就行了",2024-04-24T13:33:50Z,jidandan666, 2258115199,2075474695,"it seems related to converting the weights to HF? if so can you pls use the latest conversion script from HF. merged from this PR, ",2024-04-24T17:30:10Z,HamidShojanazeri, 2258115199,2076849704,"problem has been successfully solved by updating transformers to 4.40.1, the lastest version that seems to support llama3. Yes you hear me! Though as easy and stupid as it seems, such seemingly recondite ERROR can be addressed by updating your transformers package. SOLUTION: pip install transformers==4.40.1",2024-04-25T10:19:32Z,Michael4933, 2258115199,2077156793,"> problem has been successfully solved by updating transformers to 4.40.1, the lastest version that seems to support llama3. Yes you hear me! Though as easy and stupid as it seems, such seemingly recondite ERROR can be addressed by updating your transformers package. SOLUTION: pip install transformers==4.40.1 thinks, it work!",2024-04-25T13:15:44Z,jidandan666, 2258115199,2079309575,"> > problem has been successfully solved by updating transformers to 4.40.1, the lastest version that seems to support llama3. Yes you hear me! Though as easy and stupid as it seems, such seemingly recondite ERROR can be addressed by updating your transformers package. SOLUTION: pip install transformers==4.40.1 > > thinks, it work! Do you meet another problem like: ImportError: Using 8-bit quantization requires Accelerate: and the latest version of bitsandbytes: If not, please tell me your version of accelerate and byresandbytes, thank you!",2024-04-26T12:40:15Z,suyicon, 2258115199,2079544323,"> bitsandbytes > > > problem has been successfully solved by updating transformers to 4.40.1, the lastest version that seems to support llama3. Yes you hear me! Though as easy and stupid as it seems, such seemingly recondite ERROR can be addressed by updating your transformers package. SOLUTION: pip install transformers==4.40.1 > > > > > > thinks, it work! > > Do you meet another problem like: ImportError: Using 8-bit quantization requires Accelerate: and the latest version of bitsandbytes: > > If not, please tell me your version of accelerate and byresandbytes, thank you! haven't encountered that problem,the version of bitsandbytes== 0.43.1, and the version of accelerate == 0.29.3",2024-04-26T14:48:35Z,jidandan666, 2258115199,2079601019,"> > bitsandbytes > > > > > problem has been successfully solved by updating transformers to 4.40.1, the lastest version that seems to support llama3. Yes you hear me! Though as easy and stupid as it seems, such seemingly recondite ERROR can be addressed by updating your transformers package. SOLUTION: pip install transformers==4.40.1 > > > > > > > > > thinks, it work! > > > > > > Do you meet another problem like: ImportError: Using 8-bit quantization requires Accelerate: and the latest version of bitsandbytes: > > If not, please tell me your version of accelerate and byresandbytes, thank you! > > haven't encountered that problem,the version of bitsandbytes== 0.43.1, and the version of accelerate == 0.29.3 Thank you very much!",2024-04-26T15:18:35Z,suyicon, 2258115199,2149063350,"> problem has been successfully solved by updating transformers to 4.40.1, the lastest version that seems to support llama3. Yes you hear me! Though as easy and stupid as it seems, such seemingly recondite ERROR can be addressed by updating your transformers package. SOLUTION: pip install transformers==4.40.1 thanks,it works",2024-06-05T07:20:52Z,J3Z2Y9, 2258115199,2157473430," > > llama3 8B 即使用了GQA,然而在llama2中7B和13B是没有使用GQA的,只有70B才使用了GQA,注意修改默认的n_kv_heads和n_heads即可。 > > n_kv_heads和n_heads改成什么,能告知一下吗。谢谢了 We shall use the value ""num_key_value_heads"" from config.json to initialize the projection head dimensions, i.e., self.num_key_value_heads used below: ",2024-06-10T06:56:39Z,OscarYau525, 2258115199,2216860251,"We can transfer llama2 code to llama3 style. According to OscarYau525's answer, we know how to change LlamaAttention in initializing function. However, only change the definition of self.k_proj and self.v_proj cannot dismiss the bug. That is because the size of k and v is different from the q now. So we can add a function called repeat_kv as: In forward function, we can add: to align the size between kv and q. Then the llama2 code can run successfully in llama3 weights. ",2024-07-09T07:53:01Z,AsteriaCao, 2258061223,2071551778,The same issue.,2024-04-23T06:52:36Z,guoqiangqi, 2258061223,2071576249,I guess because you didn't use a proxy or your IP is blocked. I solve this by using clash.,2024-04-23T07:03:01Z,ghLcd9dG, 2258061223,2097194992,same issue. Did anyone get the issue solved?,2024-05-07T01:27:20Z,linhdangduy, 2258061223,2098153544,"Me too, now I don‘t know how to get the model😢",2024-05-07T11:10:25Z,Xer12306, 2258061223,2156188049,same ,2024-06-08T21:19:41Z,BrandWorksApp, 2258005940,2071544568,"Could you check your CUDA installation? The NCCL error generally occurs when you have a CUDA incompatibility. Try updating your drivers, and this should be fixed in the majority of the cases.",2024-04-23T06:50:36Z,srimouli04, 2258005940,2071657268,"Thank you for your help. But I have uninstalled cuda and cudnn, and then reinstalled them. And configure the environment variables, then restart the computer and recreate the virtual environment, but still get the same error btw: Other programs are running success, such as SD.",2024-04-23T07:53:51Z,s084088, 2258005940,2073735844,I believe I ran into a similar problem as documented in #132 ,2024-04-24T00:44:08Z,ccozad, 2258005940,2075487137," as you running on Windows, can you please check here and use to see if that work. ",2024-04-24T17:37:25Z,HamidShojanazeri, 2258005940,2075800144," Running this code ` produced this result So that's promising!",2024-04-24T20:38:08Z,ccozad, 2258005940,2076253238,"> as you running on Windows, can you please check here and use to see if that work. Great! It's working!",2024-04-25T02:43:33Z,s084088, 2257943316,2075497100," can you please explain your use-case a bit, if you are extending just a small number of vocab perhaps no, but if you are adding a language to the model, you might need to do continue pretraining, please take a look at this work.",2024-04-24T17:41:20Z,HamidShojanazeri, 2257943316,2081307335,"> can you please explain your use-case a bit, if you are extending just a small number of vocab perhaps no, but if you are adding a language to the model, you might need to do continue pretraining, please take a look at this work. Only for LLaMA-2. LLaMA-3 tokenizer is way off (and being fixed for buggy decoder).",2024-04-28T03:16:50Z,thusinh1969, 2257888577,2078660358,"Same for me. I've been trying for a few days. ""Sorry, we could not process your request at this moment. Request ID: 242688205555773""",2024-04-26T05:30:24Z,bsariturk, 2257888577,2079816622,"Did you previously sign up using the same email for Llama-2 access? I did and I also got an error: I resolved the error by typing ""Massachusetts Institute of Technology"" instead of ""MIT"" in the affiliation field. I think possibly because the former is what I used when I requested Llama-2 access. Maybe your information has to match exactly in order for the form to go through.",2024-04-26T17:41:53Z,ed1d1a8d, 2257888577,2079849004,"> Did you previously sign up using the same email for Llama-2 access? I did and I also got an error: > > I resolved the error by typing ""Massachusetts Institute of Technology"" instead of ""MIT"" in the affiliation field. I think possibly because the former is what I used when I requested Llama-2 access. Maybe your information has to match exactly in order for the form to go through. I tried it once again by changing the affiliation field as you suggested, and it worked! Thank you!",2024-04-26T18:02:49Z,AnandUgale, 2257888577,2156517718,"HTTP request sent, awaiting response... 403 Forbidden 2024-06-09 18 27 ERROR 403: Forbidden. while downloading the meta llama 3 model ",2024-06-09T12:39:57Z,avinashmyerolkar, 2257856451,2071333802,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-04-23T03:19:09Z,facebook-github-bot, 2257620827,2071112005,"Hi Thank you for your pull request and welcome to our community. # Action Required In order to merge **any pull request** (code, docs, etc.), we **require** contributors to sign our **Contributor License Agreement**, and we don't seem to have one on file for you. # Process In order for us to review and merge your suggested changes, please sign at < **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the **pull request will be tagged** with . The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it. If you have received this in error or have any questions, please contact us at [cla Thanks!",2024-04-22T23:22:33Z,facebook-github-bot, 2257620827,2071150998,Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!,2024-04-23T00:06:49Z,facebook-github-bot, 2257620827,2077571285,"Sorry, we are keeping the model card to contributors from the core team. I love the zeal though!!",2024-04-25T15:25:14Z,jspisak, 2257288347,2070727138,Hope that helps. With this lovely semi-open source project. ,2024-04-22T19:18:28Z,Radonchnk, 2256945419,2071516868,Exactly why we have to pretrain and finetune again !,2024-04-23T06:40:15Z,thusinh1969, 2256945419,2075515315,"Thank you for pointing this out! Even though the tokenizer has multilingual vocabulary, currently Llama3 doesn't support multilingual inference. Currently the models are officially supported for inference in English, but as mentions, finetuning is an option here. We have an example using Llama 2 here : ",2024-04-24T17:52:12Z,subramen, 2256631184,2079779809,"good idea - let me add these! ",2024-04-26T17:15:10Z,jspisak, 2256455121,2076426069,same,2024-04-25T05:56:37Z,AWangji, 2256455121,2092062924,"As stated here >Keep in mind that the links expire after 24 hours and a certain amount of downloads. If you start seeing errors such as , you can always re-request a link. Maybe you need to re-request for the permission.",2024-05-03T03:28:55Z,nguyenthekhoig7, 2256455121,2095999162,"I re-requested, but not working, still 403 forbidden.",2024-05-06T13:17:09Z,ZCyueternal, 2256437092,2074480371,Is the problem solved?I also encountered this problem,2024-04-24T09:14:36Z,liu904-61, 2256437092,2075523450, can you pls upgrade to the latest transformers . This should have the latest.,2024-04-24T17:56:56Z,HamidShojanazeri, 2256437092,2079975546,hey I am having the same issue after upgraded the transformers to 4.40.1 ,2024-04-26T19:08:01Z,EmilyInTheUS, 2256437092,2095198802," I also encountered this problem. Is the problem solved? ",2024-05-06T04:57:54Z,Xiaoyinggit, 2256437092,2151329259,"You need to change your function, the function I used in the .py script is **LlamaTokenizer.from_pretrained()** and you just need to change it to **AutoTokenizer.from_pretrained()**.",2024-06-06T03:05:42Z,xieziyi881, 2256415641,2069284826,"hello,LeoStrange26. could you tell me how to fix the 403 error when run download.sh. thank you any help.",2024-04-22T12:35:08Z,HYTHYThythyt, 2256415641,2069309568,"Download wget from {Select your hardware specific version} Download and install Git Bash. Copy the wget.exe file in directory. Run that download.sh from Git Bash. This worked for me on windows",2024-04-22T12:46:58Z,POTUSAITEJA, 2256415641,2069354703,"My PC's OS is macOS and the server is Ubuntu, I have tried many measures to fix it, but nothing can work. could you upload your pre-trained weigths in git if you convenience? So I can download. thank you for your answers. ",2024-04-22T13:03:22Z,HYTHYThythyt, 2256415641,2071334842,"> hello,LeoStrange26. could you tell me how to fix the 403 error when run download.sh. thank you any help. Change the IP address, such as US.",2024-04-23T03:20:29Z,lifetruth-liu, 2256415641,2071482641,"> what's the difference between llama3-8b and llama3-8b instruct? if i want to deal with the general text generation task, which one is better? llama3-8B is the base model which basically just do the completions to the input prompt, But llama3-8B Instruct is finetuned for instruction following and multi-turn conversation templates for assistant completions as chat response. If your specific purpose is for chat completions then instruct is the best choice other wise if its for simple completions of input then base model is fine. But there might be a chance for the model to continue generation till max_seq_len is achieved while generating while using base model.",2024-04-23T06:11:23Z,AswanthManoj, 2256415641,2080059402,"> hello,LeoStrange26. could you tell me how to fix the 403 error when run download.sh. thank you any help. Link might have expired. Fill out the agreement form to get a new link.",2024-04-26T20:14:49Z,jatin0801, 2256415641,2088259090,I want to labelize json's objects which is better for my task the 8b or 8b instruct?,2024-05-01T10:23:25Z,kobbinour13, 2256415641,2089863251,"> I want to labelize json's objects which is better for my task the 8b or 8b instruct? Could you please clarify what you mean by ""labelize""? Are you trying get a valid json as the response from the model? If then instruct version would be better",2024-05-02T08:11:21Z,AswanthManoj, 2256348751,2075527502, would the list of messages make it?,2024-04-24T17:59:29Z,HamidShojanazeri, 2256348751,2076174280,"It doesn't seem to work. Reasons: 1) Inference time is the same as a single inference, 2) console warnings appear one by one, it can be inferred that the model is read one by one Here is the code for batch inference: ",2024-04-25T01:42:20Z,code-isnot-cold, 2256348751,2126653010,cc @Rocketknight1,2024-05-23T09:30:04Z,ArthurZucker, 2256348751,2127131096,"Hi great question! The short answer is that the text generation pipeline will only generate one sample at a time, so you won't gain any benefit from batching samples together. If you want to generate in a batch, you'll need to use the lower-level method instead, and it's slightly more complex. However, you can definitely get performance benefits from it. You'll need to tokenize with , and , and you'll need to set a . The reason for this is that the sequences will have different lengths when you batch them together. Try this code snippet: ",2024-05-23T13:37:15Z,Rocketknight1, 2256348751,2128800255,"Thank you for your detailed explanation . I have started using the vllm method, which enables efficient inference. But I'll try to use the model.generate() method for batch generation. Thanks again for your help ",2024-05-24T07:34:13Z,code-isnot-cold, 2256348751,2129366476,my pleasure! 🤗 ,2024-05-24T12:04:33Z,ArthurZucker, 2256348751,2132115460,"I wrote my code based on 's. I am a transformers beginner and I hope that there isn't any bug in my code. **Code:** **Output:** ",2024-05-26T07:41:15Z,mirrorboat, 2256348751,2137264729,"> Thank you for your detailed explanation . I have started using the vllm method, which enables efficient inference. But I'll try to use the model.generate() method for batch generation. Thanks again for your help Would you please share your llama3 vllm inference code? I've search it in but failed to find a suitable script.",2024-05-29T12:13:52Z,mirrorboat, 2256348751,2137392863,"Sure, This is a website for your reference: I find that vllm seems to be inferior to transformers method in batch inference. Maybe there is something wrong with my code, please communicate more after trying it ",2024-05-29T13:16:32Z,code-isnot-cold, 2256348751,2137653435," Here ",2024-05-29T15:09:24Z,mirrorboat, 2256348751,2139900465,"I read the issue and tried your code, which worked perfectly. Thank you for your contribution",2024-05-30T15:16:22Z,code-isnot-cold, 2256348751,2370017321,"> I wrote my code based on 's. I am a transformers beginner and I hope that there isn't any bug in my code. **Code:** > > `python > import torch > from transformers import AutoModelForCausalLM, AutoTokenizer > import time > > model_id = > tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side = ""left"") > tokenizer.pad_token_id = tokenizer.eos_token_id > model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map=""auto"") > terminators = [ > tokenizer.eos_token_id, > tokenizer.convert_tokens_to_ids(""<|eot_id|>"") > ] > > myinput=[ > [{""role"": ""user"", ""content"": ""1 + 1 = ""}], > [{""role"": ""user"", ""content"": ""Introduce C++ in one short sentence less than 10 words.""}], > [{""role"": ""user"", ""content"": ""Who was the first president of the United States? Answer in less than 10 words.""}], > [{""role"": ""user"", ""content"": ""What is the capital of France ? Answer in less than 10 words.""}], > [{""role"": ""user"", ""content"": ""Why is the sky blue ? Answer in less than 10 words.""}], > [{""role"": ""user"", ""content"": ""What is the meaning of life? Answer in less than 10 words.""}], > [{""role"": ""user"", ""content"": ""What is the best way to learn a new language? Answer in less than 10 words.""}], > [{""role"": ""user"", ""content"": ""When is the best time to plant a tree? Answer in less than 10 words.""}], > [{""role"": ""user"", ""content"": ""What is the best way to cook an egg? Answer in less than 10 words.""}], > [{""role"": ""user"", ""content"": ""Which is the best programming language? Answer in less than 10 words.""}] > ] > texts = tokenizer.apply_chat_template(myinput, add_generation_prompt=True, tokenize=False) > inputs = tokenizer(texts, padding=""longest"", return_tensors=""pt"") > inputs = {key: val.cuda() for key, val in inputs.items()} > temp_texts=tokenizer.batch_decode(inputs[""input_ids""], skip_special_tokens=True) > > start_time = time.time() > gen_tokens = model.generate( > **inputs, > max_new_tokens=512, > pad_token_id=tokenizer.eos_token_id, > eos_token_id=terminators, > do_sample=True, > temperature=0.6, > top_p=0.9 > ) > print(f""Time: {time.time()-start_time}"") > > gen_text = tokenizer.batch_decode(gen_tokens, skip_special_tokens=True) > gen_text = [i[len(temp_texts[idx]):] for idx, i in enumerate(gen_text)] > print(gen_text) > ` > > **Output:** > > ` > Time: 2.219297409057617 > ['2', 'C++ is a powerful, compiled, object-oriented programming language.', 'George Washington, first president of the United States.', 'The capital of France is Paris.', 'Scattered sunlight by tiny molecules in atmosphere.', 'To find purpose, happiness, and fulfillment through experiences.', 'Immerse yourself in the language through listening and speaking.', ""In your area's dormant season, typically late winter or early spring."", 'Poach it in simmering water for a perfect yolk.', 'There is no single ""best"" language, it depends on context.'] > ` i'm sorry for running this with bug,here is the bug saying: Traceback (most recent call last): File ""batch_inference.py"", line 26, in texts = tokenizer.apply_chat_template(myinput, add_generation_prompt=True, tokenize=False) File line 1743, in apply_chat_template rendered = compiled_template.render( File line 1301, in render self.environment.handle_exception() File line 936, in handle_exception raise rewrite_traceback_stack(source=source) File ""