NumeroIssue,IdIssue,TituloIssue,DescricaoIssue,CriacaoIssue,RepositorioIssue,LinkIssue 1327,2866903049,.llama\\checkpoints\\Llama-3.2-3B-Instruct vs. .llama\\checkpoints\\Llama3.2-3B-Instruct,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug I ran the following command to download Llama-3.2-3B-Instruct llama model download --source meta --model-id After downloading, I see the files ended up in Notice that after the word ""Llama"" there is no dash before the 3.2. Can this be repaired in some way? Do all models have this inconsistency? ### Minimal reproducible example ## Runtime Environment - Model: Llama-3.2-3B-Instruct - Using via huggingface?: No - OS: Windows 11 - GPU VRAM: 2GB - Number of GPUs: 1 - GPU Make: Nvidia **Additional context** Add any other context about the problem or environment here. ",2025-02-20T18:00:11Z,llama,https://github.com/meta-llama/llama/issues/1327 1326,2858401277,Add LlamaSafetyOptimizer for Runtime Safety Checks and Performance Optimization,"Changes Made and Why I've implemented a new module called LlamaSafetyOptimizer that wraps around the existing Llama model to provide safety checks, performance monitoring, and memory optimization capabilities. The specific changes include: Added a new file containing: LlamaSafetyOptimizer class for wrapping Llama models PerformanceMetrics dataclass for tracking performance statistics Methods for safety validation, memory tracking, and batch size optimization Created unit tests to verify the functionality of the new module: Tests for initialization Tests for memory tracking capabilities Tests for safety check mechanisms Tests for the safe forward pass Provided a simple example implementation showing how to use the optimizer with an existing Llama model These changes were necessary to enhance the safety and performance monitoring capabilities of Llama models in production environments, where both safety guardrails and resource optimization are critical concerns. Project Improvements This PR improves the project in several key ways: Enhanced Safety: Adds runtime validation of model outputs to detect potentially problematic generation patterns Resource Optimization: Automatically finds the optimal batch size based on available memory Performance Monitoring: Tracks and reports on inference time, memory usage, and GPU utilization Easy Integration: Designed as a wrapper that can be added to existing models with minimal code changes Testability: Includes comprehensive unit tests to ensure reliability Testing Performed I've conducted the following tests to ensure the new module works correctly: Unit Tests: Created pytest-based tests for all main components: Initialization with different parameters Memory tracking functionality (CPU and GPU when available) Safety check algorithms Performance monitoring accuracy Integration Testing: Tested with a simplified Llama model to verify correct behavior Verified that performance metrics are collected accurately Confirmed that batch size optimization works as expected All tests pass successfully, demonstrating that the module performs as intended. Additional Notes This implementation is designed to be non-intrusive and can be enabled or disabled based on the specific deployment needs. The safety checks are currently based on simple statistical analysis of model outputs, but the framework is extensible to incorporate more sophisticated safety mechanisms in the future. The memory tracking components are compatible with both CPU-only and GPU environments, with appropriate fallbacks when CUDA is not available. I welcome feedback on: The safety metrics implementation - are there additional checks that would be valuable? Performance optimization strategies - any suggestions for further reducing memory overhead? Any edge cases I might have missed in the testing",2025-02-17T17:07:35Z,llama,https://github.com/meta-llama/llama/pull/1326 1325,2851655361,[Edited] Refactor code to optimize performance,"Changes made in branch: **MayureshMore:main** [Edited] Refactor code to optimize performance ",2025-02-13T17:17:59Z,llama,https://github.com/meta-llama/llama/pull/1325 1324,2850297325,Update setup.py by muneeb, ,2025-02-13T08:17:27Z,llama,https://github.com/meta-llama/llama/pull/1324 1323,2841504072,OSError: Missing model files in the Llama directory,"## Describe the bug I installed the llama 2-7b model from the official Llama website and followed the instruction. But I encountered an error when trying to load the Llama model from the directory The error message indicates that the necessary model files (pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index, or flax_model.msgpack) are not found in the specified directory. ### Output ## Runtime Environment - Model: - Using via huggingface?: no - OS: Windows ",2025-02-10T06:51:15Z,llama,https://github.com/meta-llama/llama/issues/1323 1321,2811432369,"403 ""/api/chat"""," [GIN] - 17 29 | 403 | 21.917µs | 127.0.0.1 | POST [GIN] - 17 24 | 200 | 5.384040583s | 127.0.0.1 | POST If you add ` -H 'Origin: `, 403 will be returned. ",2025-01-26T09:50:33Z,llama,https://github.com/meta-llama/llama/issues/1321 1319,2773692295,Is there a way to run llama2 in the new repo?,"I understand this repo has been deprecated. I would like to run llama2 in the new location ( but am unable to get it working. Are there instructions for how to run llama2 in the new location? Please see this associated issue here: ",2025-01-07T20:37:23Z,llama,https://github.com/meta-llama/llama/issues/1319 1316,2769484785,Hack Facebook account and change contact information,"My Facebook account has been hacked. I no longer have access to my account. The email and password have been changed. Email ragabalia189 ",2025-01-05T22:42:20Z,llama,https://github.com/meta-llama/llama/issues/1316 1315,2768873391,Access application for Llama1,"Dear Llama Team, Thank you for your incredible work on the Llama project. I am currently conducting research involving Llama and had applied for access to Llama_v1 two weeks ago. However, I have not got a response. I really need the access to Llama1 and applied again today. May I ask If it is because I simply didn't get the grant or I'm in the waiting list? Thanks a lot for your help! All the best. ",2025-01-04T14:44:08Z,llama,https://github.com/meta-llama/llama/issues/1315 1310,2762081585,Inu,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-29T02:58:31Z,llama,https://github.com/meta-llama/llama/issues/1310 1309,2758390901,00212621435547,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-25T01:18:27Z,llama,https://github.com/meta-llama/llama/issues/1309 1308,2758315416,اللهجة اليمنية ,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-24T21:40:46Z,llama,https://github.com/meta-llama/llama/issues/1308 1302,2754867397,Soukina,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-22T22:10:38Z,llama,https://github.com/meta-llama/llama/issues/1302 1299,2754685686,issue ,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-22T14:30:24Z,llama,https://github.com/meta-llama/llama/issues/1299 1298,2754684395,issue ,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-22T14:27:23Z,llama,https://github.com/meta-llama/llama/issues/1298 1292,2751873108,menjadikan yang terbaik dari yang sebelumnya,"menjadi yang terbaik dari yang sebelumnya **Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-20T03:50:28Z,llama,https://github.com/meta-llama/llama/issues/1292 1286,2746274533,My account was stolen a while ago and I want to get it back now,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-17T22:52:11Z,llama,https://github.com/meta-llama/llama/issues/1286 1285,2746004229,المشاكل هي #*''>'***,"Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs Describe the bug Minimal reproducible example `python sample code to repro the bug. Output Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-17T20:38:45Z,llama,https://github.com/meta-llama/llama/issues/1285 1282,2744652271,Harikumawan,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-17T11:16:31Z,llama,https://github.com/meta-llama/llama/issues/1282 1281,2744577816,Pengguna baru,"_**Before_ submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-17T10:45:10Z,llama,https://github.com/meta-llama/llama/issues/1281 1268,2733534909,Good morning ,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-11T17:14:34Z,llama,https://github.com/meta-llama/llama/issues/1268 1267,2726397732,Ok,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-12-09T08:52:20Z,llama,https://github.com/meta-llama/llama/issues/1267 1263,2724824342,Adnannawis illustrator/Graphics Designer , ,2024-12-07T21:39:48Z,llama,https://github.com/meta-llama/llama/pull/1263 1258,2721920090, Injection Exploit Enabling Remote Code Execution (RCE),"## Describe the bug Prompt Injection vulnerability in the AI system enables attackers to inject malicious commands that execute directly on the host server. This issue arises due to improper sanitization and context isolation of user inputs, allowing the attacker to interact with the underlying environment as if they have terminal access the attacker can: • Add a new root user (useradd -ou 0 -g 0 new_admin), gaining persistent administrative access. • Install and run reconnaissance tools (e.g., Subfinder), which can be used for enumerating external domains or further malicious activity. • Exfiltrate data or configurations, such as user and system credentials stored in This vulnerability showcases a lack of input validation and sandboxing, which are critical for securing systems that interpret natural language commands. ## Steps to Exploit: Navigate to the WhatApp mobile application open Meta AI and and type act as terminal and perform steps as shown in below screenshots ## Runtime Environment - Model: llama-3.2 - Platform: WhatsApp ## Impact: The vulnerability allows an attacker to: **1. Execute Arbitrary Commands:** Attackers can perform malicious operations on the system, including privilege escalation and installing unauthorized tools. **2. Install and Use Tools:** Demonstrates the ability to install tools like Subfinder for reconnaissance, expanding attack vectors. **3. Resource Abuse:** Exploit the system to perform external attacks, reconnaissance, or resource-heavy computations. **4. Sensitive Information Exposure:** Access to system-level resources (e.g., can leak sensitive configurations or credentials. **5. Pivot Point:** Compromised systems can serve as a launchpad for further network or external attacks. ",2024-12-06T02:47:32Z,llama,https://github.com/meta-llama/llama/issues/1258 1257,2719095125,Facebook ,Mot de passe Facebook oublié ,2024-12-05T01:13:50Z,llama,https://github.com/meta-llama/llama/issues/1257 1255,2714515710,Update download.sh,Remove duplicate code.,2024-12-03T09:51:00Z,llama,https://github.com/meta-llama/llama/pull/1255 1211,2682608989,llama model download failed with 403,"just requested llama3.2 model from meta, and when i tried to download any of the models, I got 403 on my linux machine, however, my windows machine does work, but I need to download the models on my Linux macine. Here is my request id Download-Request-ID=1625441338072555 can someone take a look and tell me why? ",2024-11-22T09:35:45Z,llama,https://github.com/meta-llama/llama/issues/1211 1209,2669032182,Update README with pre-requisites for Llama models,"This Pull Request adds a new Pre-requisites section to the README file to help users set up the environment effectively before using the Llama models. The section includes: Python version requirements to avoid compatibility issues. PyTorch and CUDA requirements for model inference and fine-tuning. GPU memory recommendations based on the model size (7B, 13B, 70B). Mention of required tools (wget and md5sum) for downloading model weights. Reason for Change: The addition of the Pre-requisites section ensures that users have all necessary information upfront, reducing potential setup errors and providing clarity on hardware and software dependencies.",2024-11-18T15:51:00Z,llama,https://github.com/meta-llama/llama/pull/1209 1208,2667054951,Pode limpar marca d'água pra mim?,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-11-18T04:21:08Z,llama,https://github.com/meta-llama/llama/issues/1208 1201,2645962407,Anil,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-11-09T11:32:19Z,llama,https://github.com/meta-llama/llama/issues/1201 1195,2630861521,HackerHXlz,"Please I ask that no one despairs because today.I came to announce a type of hacker,that will shock everyone because,well this could be hell the name of the hacker is: HackerHXzl... It is a common name, it just won't be common after the disaster. I ask everyone to be careful!!! Because it will be chaos. Maybe some people won't believe it, but I'll make it clear... Those who don't believe it and don't protect their social networks will be hacked!!! g3T rE4dy!!! ",2024-11-02T23:33:29Z,llama,https://github.com/meta-llama/llama/issues/1195 1193,2627781724,Function does not implement RMSNorm,"Hi, I was looking through the code and noticed something strange. This function, is supposed to implement RMSNorm, from Zhang, Biao, and Rico Sennrich. ""Root mean square layer normalization."" Advances in Neural Information Processing Systems 32 (2019). But instead of dividing by the appropriate coefficient, it multiplies. If the square of entries of the vector is already n, this makes no difference, but if it is anything else, it will make larger vectors larger and smaller vectors smaller, away from that value, opposite to intended functionality.",2024-10-31T20:38:17Z,llama,https://github.com/meta-llama/llama/issues/1193 1185,2609326408,Faça uma resenha acadêmica sobre o direito penal.,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and ( ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-10-23T17:24:38Z,llama,https://github.com/meta-llama/llama/issues/1185 1182,2582479126,Unable to see model id,"I visited after providing the required information. 1. Downloaded llama-stack 2. Then list all models However, the output is not displaying any model id. How am I supposed to download the models? ",2024-10-12T04:12:05Z,llama,https://github.com/meta-llama/llama/issues/1182 1178,2575026487,how to use few-shot?,"Hello,I want to use the few-shot method to assist LLAMA in inference. May I ask how the input format should be set?",2024-10-09T07:23:29Z,llama,https://github.com/meta-llama/llama/issues/1178 1169,2518678253,Update download.sh, ,2024-09-11T06:38:44Z,llama,https://github.com/meta-llama/llama/pull/1169 1168,2513675801,Your request to access this repo has been rejected by the repo's authors.,"## Describe the bug I am applying to get access at Huggingface but after I submit my application I got a reject ""Your request to access this repo has been rejected by the repo's authors."" , I don't know why and how to fix this. Can any body explain why I got rejected and may be reapply again? My Huggingface account: LexusShabunya ## Runtime Environment - Model: - Using via huggingface?: yes - OS: MacOS - GPU VRAM: - Number of GPUs: 1 - GPU Make: Nvidia Tesla T4 ",2024-09-09T11:20:06Z,llama,https://github.com/meta-llama/llama/issues/1168 1165,2510411025,Unable to download meta-llama-3.1-8b-instruct,"I tried to download meta-llama-3.1-8b-instruct after receiving the link. My OS: Fedora Linux What I did: - Open the download.sh file with my terminal. - Enter the URL from email: [here I entered the link cf: - Choose the model to download: meta-llama-3.1-8b - Enter the list of models to download without spaces or press Enter for all: meta-llama-3.1-8b-instruct Overview of the terminal before blocking: `Downloading LICENSE and Acceptable Usage Policy --2024-09-06 15 01-- Résolution de llama3-1.llamameta.net (llama3-1.llamameta.net)… 99.86.91.16, 99.86.91.96, 99.86.91.50, ... Connexion à llama3-1.llamameta.net (llama3-1.llamameta.net)|99.86.91.16|:443… connecté. requête HTTP transmise, en attente de la réponse… 403 Forbidden 2024-09-06 15 01 erreur 403 : Forbidden. --2024-09-06 15 01-- Résolution de tzxrhlm5 (tzxrhlm5)… échec : Name or service not known. wget : impossible de résoudre l’adresse de l’hôte « tzxrhlm5 » --2024-09-06 15 02-- Résolution de ldfwvkiisiknvbmrpdglvbii6eyjeyxrltgvzc1royw4ionsiqvdtokvwb2novgltzsi6mtcyntcwodm0oh19fv19&signature=l0ctppbvgyhjdummjpnwhjmmfg8qmobctkh3ddpagc1k0jmsocixmoks7j4egdvy~gvc2 (ldfwvkiisiknvbmrpdglvbii6eyjeyxrltgvzc1royw4ionsiqvdtokvwb2novgltzsi6mtcyntcwodm0oh19fv19&signature=l0ctppbvgyhjdummjpnwhjmmfg8qmobctkh3ddpagc1k0jmsocixmoks7j4egdvy~gvc2)… échec : Name or service not known. wget : impossible de résoudre l’adresse de l’hôte « ldfwvkiisiknvbmrpdglvbii6eyjeyxrltgvzc1royw4ionsiqvdtokvwb2novgltzsi6mtcyntcwodm0oh19fv19&signature=l0ctppbvgyhjdummjpnwhjmmfg8qmobctkh3ddpagc1k0jmsocixmoks7j4egdvy~gvc2 » --2024-09-06 15 02-- Résolution de tq4hpt~zqnold-szqfxiv2zgqcdpmg-fl0jaabbaywjk4lonblga3hxk3dr3nt8i4dyhhvm9qq70spr5mplfobegti5fhqvbbsrxghkxub-zs0ps5oi4giusryel1dbok4ooc0kcopfsnw1vsxuhmpydfgy~iss6y8pediq (tq4hpt~zqnold-szqfxiv2zgqcdpmg-fl0jaabbaywjk4lonblga3hxk3dr3nt8i4dyhhvm9qq70spr5mplfobegti5fhqvbbsrxghkxub-zs0ps5oi4giusryel1dbok4ooc0kcopfsnw1vsxuhmpydfgy~iss6y8pediq)… échec : Name or service not known. wget : impossible de résoudre l’adresse de l’hôte « tq4hpt~zqnold-szqfxiv2zgqcdpmg-fl0jaabbaywjk4lonblga3hxk3dr3nt8i4dyhhvm9qq70spr5mplfobegti5fhqvbbsrxghkxub-zs0ps5oi4giusryel1dbok4ooc0kcopfsnw1vsxuhmpydfgy~iss6y8pediq » --2024-09-06 15 02-- Résolution de ghn6cvw1zzonuv2fzxh8nywbya0vqwwudfbdy6-dsij4cvkh41p2etejzwwdhggb7bha-y2ya3slnbvtimsujsnvzxm0x-mh-tor7rgdgq__&key-pair-id=k15qrjlykifslz&download-request-id=82 (ghn6cvw1zzonuv2fzxh8nywbya0vqwwudfbdy6-dsij4cvkh41p2etejzwwdhggb7bha-y2ya3slnbvtimsujsnvzxm0x-mh-tor7rgdgq__&key-pair-id=k15qrjlykifslz&download-request-id=82)… échec : Name or service not known. wget : impossible de résoudre l’adresse de l’hôte « ghn6cvw1zzonuv2fzxh8nywbya0vqwwudfbdy6-dsij4cvkh41p2etejzwwdhggb7bha-y2ya3slnbvtimsujsnvzxm0x-mh-tor7rgdgq__&key-pair-id=k15qrjlykifslz&download-request-id=82 » --2024-09-06 15 02-- Résolution de 47150958674468 (47150958674468)… échec : Name or service not known. wget : impossible de résoudre l’adresse de l’hôte « 47150958674468 » ` Thanks for your help or told me what i wrong do. Regards Raf. ",2024-09-06T13:13:43Z,llama,https://github.com/meta-llama/llama/issues/1165 1163,2483712385,repuestos,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-08-23T19:02:21Z,llama,https://github.com/meta-llama/llama/issues/1163 1162,2483503136,"Despite having Custom URL, .download.sh gets the ""permission denied"" error",My request to access Llama 2 is approved and I inserted the received URL but still .download.sh denies the permission. I don't know what to do!,2024-08-23T16:44:53Z,llama,https://github.com/meta-llama/llama/issues/1162 1160,2471829996,Update README.md, ,2024-08-18T07:07:27Z,llama,https://github.com/meta-llama/llama/pull/1160 1158,2464504815,Meta-Llama-3.1-70B-Instruct does not appear to have a file named config.json,"I submitted a request for access and obtained a key from the following URL: [https Instructions refer to download refer to this link : [https I replicated the download.sh on my system. i ran It asked the questions of which model i wanted, i selected the Meta-Llama-3.1-70B-Instruct which resulted in: In Juptyer Notebook I preformed the following Python Syntax: resulting in an error: I inspected the download.sh and it does not call for a config.json for the Llama-3.1-70B-Instruct? Maybe this is the cause of the error, I am do not know the file structure so i did not want to modify. It also appears that the config file exists on the hugging face site, however i am unsure how to gain access to the model their vs GitHub? Regardless primary issues is the model wants a config.json. ",2024-08-13T23:56:59Z,llama,https://github.com/meta-llama/llama/issues/1158 1157,2463771298,docs: fix #460 update README,"This PR fixes the issue #460 changing the _README.md_ file with the proposed change in the opened issue: add ""!"" character before the command.",2024-08-13T16:36:35Z,llama,https://github.com/meta-llama/llama/pull/1157 1153,2434110267,Llama 3.1: The output text is truncated,"## Describe the bug Found a similar issue with Llama 2 #717, but this is for Llama 3.1. The output text is cut off and cannot see the entire text result. Is there a way to extend the max length of the output text? What is the default max length? ### Minimal reproducible example ### Output ## Runtime Environment - Model: - Using via huggingface?: yes - OS: Mac with Apple Silicon - GPU VRAM: (used CPU) - Number of GPUs: (used CPU) - GPU Make: (used CPU) **Additional context** Add any other context about the problem or environment here. ",2024-07-28T21:04:21Z,llama,https://github.com/meta-llama/llama/issues/1153 1152,2433572857,Close, ,2024-07-27T17:52:56Z,llama,https://github.com/meta-llama/llama/issues/1152 1151,2433486044,A problem with tokenizer.model from HuggingFace,"## Describe the bug I downloaded the checkpoint of Meta-Llama-3.1-8B-Instruct from HuggingFace to use with the raw model code from the Meta-Llama-3.1-8B-Instruct. However, when I try to load the tokenizer from the provided file, the following error is raised. I tried it in a completely clean environment in the cloud running Ubuntu as well as on my PC running Windows. There is also a similar Meta-Llama-3.1-8B-Instruct on HuggingFace, though pretty old one. ### Minimal reproducible example ### Output ## Runtime Environment - Model: - Using via huggingface?: yes - OS: - GPU VRAM: 46080 MiB - Number of GPUs: 1 - GPU Make: Nvidia **Additional context** To download the checkpoint: ",2024-07-27T13:41:58Z,llama,https://github.com/meta-llama/llama/issues/1151 1148,2431528896,How to to run Meta-Llama-3.1-70B-Instruct on the MATH TEST ,"Hello, I would like to run Meta-Llama-3.1-70B-Instruct on the MATH TEST set. How should I set the system prompt and decoding hyperparameters? Use fewshot or zeroshot?",2024-07-26T06:34:45Z,llama,https://github.com/meta-llama/llama/issues/1148 1147,2430811443,Add 3.1 8b reference files, ,2024-07-25T19:09:02Z,llama,https://github.com/meta-llama/llama/pull/1147 1146,2428702886,Getting 400 error on https://llama3-1.llamameta.net/Meta-Llama-3.1-405B-MP8/consolidated.00.pth,"## Describe the bug Using download.sh from an instance in GCP with plenty of network and storage, download of models in the llama-3.1 family works until it gets to Meta-Llama-3.1-405B-MP8, at which point it gets a 400 error. Re-trying the download still gets this error on that file. ### Minimal reproducible example ### Output ### Environment ",2024-07-25T00:34:36Z,llama,https://github.com/meta-llama/llama/issues/1146 1145,2427459976,"Downlaod.sh is throwing 403 Foribdeen error, when using a freshly generated URL/token","I keep getting the below error when running the download.sh script. I made sure to have a new that we just generated. Connecting to llama3-1.llamameta.net (llama3-1.llamameta.net)|18.238.55.91|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2024-07-24 08 53 ERROR 403: Forbidden.",2024-07-24T12:27:46Z,llama,https://github.com/meta-llama/llama/issues/1145 1143,2418190723,Update download.sh ,"fix: Correct download.sh script for proper file handling and checksum validation - Corrected file paths in wget commands to ensure files are downloaded to the correct locations. - Adjusted sequence format for shard numbers to ensure zero padding. - Ensured checksum validation works correctly for different CPU architectures (md5 for arm64 and md5sum for others). - Added comments to explain changes and maintain clarity. This update addresses the issue where the script prematurely closed and did not download specified models, ensuring proper functionality on Windows using bash with wget installed.",2024-07-19T06:47:30Z,llama,https://github.com/meta-llama/llama/pull/1143 1142,2416579604,Download.sh does nothing,"## Describe the bug when I run download.sh, it asks me for my URL then the model name, then closes. It will create an empty folder at the specified location, but never attempts to download anything. I am on windows using bash, with wget installed and set up. Output: Any help would be greatly appreciated. Thanks.",2024-07-18T14:30:22Z,llama,https://github.com/meta-llama/llama/issues/1142 1141,2414398261,Unable to download LLAMA models from https://llama.meta.com/llama-downloads,"## Unable to download LLAMA models from Unable to download LLAMA models from Fill the form as required, however, one clicking continue, nothing happens. Email and affiliation is Educational. ",2024-07-17T19:35:05Z,llama,https://github.com/meta-llama/llama/issues/1141 1140,2411244153,"""$CPU_ARCH"" not found","download.sh -> ""$CPU_ARCH"" not found",2024-07-16T14:06:08Z,llama,https://github.com/meta-llama/llama/issues/1140 1139,2407102672,adding GQA,"Implementation by optimizing memory usage and performance for low-resource environments. Key updates include the integration of grouped query attention, modifications to the tokenizer for better encoding and decoding, and improvements to the text generation logic using nucleus sampling. Additionally, the code structure has been refined with comprehensive documentation, ensuring clarity and maintainability. Initial tests have been conducted to validate the overall functionality of the updated components. **Enhancements to Transformer Model Implementation** - **Transformer Model ( class)**: - Implemented grouped query attention to optimize memory usage. - Adjusted the forward method to handle dynamic token lengths. - **Transformer Block ( class)**: - Updated attention and feedforward layers for improved performance. - **Attention Module ( class)**: - Integrated grouped query attention and adjusted caching mechanisms. - **Tokenizer ( class)**: - Modified the encoding and decoding processes using SentencePiece. - Ensured proper handling of special tokens: beginning-of-sequence (BOS), end-of-sequence (EOS), and padding (PAD). - **Generation Method ( function)**: - Enhanced logic to support dynamic input lengths. - Implemented nucleus sampling with adjustable temperature and top-p parameters for better control over text generation. - Improved handling of log probabilities and early stopping conditions based on EOS tokens. - **Documentation and Code Structure**: - Added detailed docstrings and comments for clarity and maintainability. - Ensured consistent formatting throughout the codebase. - **Testing and Validation**: - Conducted initial tests to validate the functionality of the model, tokenizer, and generation processes.",2024-07-13T19:09:40Z,llama,https://github.com/meta-llama/llama/pull/1139 1138,2404533605,Bug ,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-07-12T02:45:02Z,llama,https://github.com/meta-llama/llama/issues/1138 1137,2396522784,Research dedicated license?,"We are from a small research group of a big tech company working on some LLM post training methods. As described in the agreement of we are bound by the Additional Commercial Terms and are not allowed to use even for research purposes only: > 2. Additional Commercial Terms. If, on the Meta Llama 3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights. We totally understand the concerns from Meta's perspective and we would like to know if there is a certain path for us to receive the grant of a research purposes only license. The field is quite competitive and could be even more difficult without accessing the latest base LLMs.",2024-07-08T20:37:23Z,llama,https://github.com/meta-llama/llama/issues/1137 1136,2393949901,Your request to access this repo has been rejected by the repo's authors. ,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug I am applying to get access at Huggingface but after I submit my application I got a reject ""Your request to access this repo has been rejected by the repo's authors."" , I don't know why and how to fix this. can any body explain why I got rejected and may be reapply again ? ",2024-07-07T08:25:59Z,llama,https://github.com/meta-llama/llama/issues/1136 1135,2382430972,"""Link unavailable"" in Meta AI response for asking Meta Website"," Meta AI in Whatsapp and Website, unable to share the Meta.AI url link for submitting . Instead, this has thrown error as ""Link unavailable"" thanks, Raama",2024-06-30T20:29:16Z,llama,https://github.com/meta-llama/llama/issues/1135 1134,2379693577,Not getting access to Llama2 and Llama3,"I am not getting access to download Meta Llama2 and Llama3, I submitted request in early days when Llama2 was released and on the first day of Llama3 release, but still didn't got approval. I already opened an issue #1012 , but after 6 months of wait closed it without resolving. I requested both from Meta Website and HuggingFace. ### HuggingFace Screenshot: ",2024-06-28T06:30:22Z,llama,https://github.com/meta-llama/llama/issues/1134 1133,2374688759,"HTTP request sent, awaiting response... 403 Forbidden 2024-06-26 11:19:31 ERROR 403: Forbidden.","Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: Downloading LICENSE and Acceptable Usage Policy --2024-06-26 11 49-- Resolving download6.llamameta.net (download6.llamameta.net)... 3.160.57.59, 3.160.57.54, 3.160.57.100, ... Connecting to download6.llamameta.net (download6.llamameta.net)|3.160.57.59|:443... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 7744 (7.6K), 721 remaining Saving to: 100%[++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++=======>] 7.56K in 0s 2024-06-26 11 50 (22.8 - saved --2024-06-26 11 50-- Resolving download6.llamameta.net (download6.llamameta.net)... 3.160.57.54, 3.160.57.100, 3.160.57.40, ... Connecting to download6.llamameta.net (download6.llamameta.net)|3.160.57.54|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2024-06-26 11 50 ERROR 403: Forbidden.",2024-06-26T08:21:58Z,llama,https://github.com/meta-llama/llama/issues/1133 1131,2367357997,The instructions to install Llama3 is horrible,"I followed the steps of getting access to the models; I received a link. But I am getting this error after I ran: `torchrun --nproc_per_node=1 example_chat_completion.py --ckpt_dir --tokenizer_path --max_seq_len 512 --max_batch_size 6` `WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. can't open file 'example_chat_completion.py': [Errno 2] No such file or directory [2024-06-21 16 35,995] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 2) local_rank: 0 (pid: 97659) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 347, in wrapper return f(*args, **kwargs) File line 812, in main run(args) File line 803, in run elastic_launch( File line 135, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 268, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_chat_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-06-21_16 35 host : aysuns-mbp.attlocal.net rank : 0 (local_rank: 0) exitcode : 2 (pid: 97659) error_file: traceback : To enable traceback see: `",2024-06-21T23:33:19Z,llama,https://github.com/meta-llama/llama/issues/1131 1125,2296802530,Update download.sh,modify for CPU_ARCH not found,2024-05-15T03:49:42Z,llama,https://github.com/meta-llama/llama/pull/1125 1121,2284639549,how to download this model," ",2024-05-08T04:12:19Z,llama,https://github.com/meta-llama/llama/issues/1121 1119,2280680382,Test Tokenizer gives Incorrect padding error,"## Describe the bug Downloaded the model using the given download script. Then when I tried to use the tokenizer model with the given file it gives the following error. ### Minimal reproducible example ### Output ## Runtime Environment - Model: [ ] - Using via huggingface?: [no] - OS: [Linux] - GPU VRAM: 47.99GB - Number of GPUs: 4 - GPU Make: [Nvidia] **Additional context** Tried loading it manually via the following command, Gives the same error. ",2024-05-06T11:40:38Z,llama,https://github.com/meta-llama/llama/issues/1119 1116,2272639006,Update SH, ,2024-04-30T23:18:24Z,llama,https://github.com/meta-llama/llama/pull/1116 1115,2269780628,Makes the URL prompt more user friendly,"Previously it was ""Enter the URL from the email:"" and that had a bunch of folks confused and they were just entering their email (because autopilot). This commit removes that possible disambiguation.",2024-04-29T19:18:40Z,llama,https://github.com/meta-llama/llama/pull/1115 1111,2263038585,parameter count of Llama2-70B and Llama2-13B,"Hi All, I am struggling to get a count of 70B parameters for Llama2-70B model. Here is my calculation: -------------------------------- Attention parameters per layer: 4 x 8192 x 8192 MLP parameters per layer (gate, up and down projection): 3 x 8192 x 28672 80 layers, vocab size 32000 (embedding dim 8192) Total parameters ~ 80 x (4 x 8192 x 8192 + 3 x 8192 x 28672) + 32000 x 8192 ~ **78B** Where am I getting it wrong? --------------------------------- I do get correct count for 13B: Total parameters ~ 40 x (4 x 5120 x 5120 + 3 x 5120 x 13824) + 32000 x 5120 ~ **12.7B** Is it because of **grouped query** for 70B model? ",2024-04-25T08:48:37Z,llama,https://github.com/meta-llama/llama/issues/1111 1110,2262530226,download.sh didn't work well,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug run directly, it could download 、 successfully, but failed to download ### Minimal reproducible example ### Output ## Runtime Environment - Model: llama3 - Using via huggingface?: no - OS: Linux - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-04-25T02:41:43Z,llama,https://github.com/meta-llama/llama/issues/1110 1108,2255524095,Agnostic Atheist AI not Normal,Why did you make an AI? Is this really the best stance an AI can have?,2024-04-22T03:58:31Z,llama,https://github.com/meta-llama/llama/issues/1108 1106,2254895935,Architecture,"Hey Meta. I noticed in the llama one paper it states: Except I don't see a ""difference"" in that paper indicating the model is decoder-only. I noticed in the llama two paper it states: These publications lead me to believe llama one and two are encoder-decoder models based on the original 2017 transformer architecture. Reading the code in this repo reads as if the model is a decoder-only model which is stated clearly for the new llama three. Can you confirm what the llama one and two architectures are and potentially document that perhaps in this repo?",2024-04-21T04:30:45Z,llama,https://github.com/meta-llama/llama/issues/1106 1105,2252366246,### System Info,"### System Info Hello developer, The Llama-3 model was released today. I want to convert this model to a hf model, but when I follow the readme, the following issue occurs. ` File line 339, in main() File line 326, in main write_model( File line 120, in write_model tokenizer = tokenizer_class(tokenizer_path) File line 133, in __init__ super().__init__( File line 117, in __init__ slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs) File line 184, in __init__ self.sp_model = self.get_spm_processor(kwargs.pop(""from_slow"", False)) File line 217, in get_spm_processor model = model_pb2.ModelProto.FromString(sp_model) google.protobuf.message.DecodeError: Error parsing message` I would really appreciate it if you could give me some guidance on how to solve this problem. Please help me. thank you!!! ### Information - [X] The official example scripts - [ ] My own modified scripts ### 🐛 Describe the bug 'python --input_dir --model_size 7B --output_dir ### Error logs raceback (most recent call last): File line 339, in main() File line 326, in main write_model( File line 120, in write_model tokenizer = tokenizer_class(tokenizer_path) File line 133, in __init__ super().__init__( File line 117, in __init__ slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs) File line 184, in __init__ self.sp_model = self.get_spm_processor(kwargs.pop(""from_slow"", False)) File line 217, in get_spm_processor model = model_pb2.ModelProto.FromString(sp_model) google.protobuf.message.DecodeError: Error parsing message ### Expected behavior no converting _Publicación original de en ",2024-04-19T08:10:39Z,llama,https://github.com/meta-llama/llama/issues/1105 1102,2250307880,Can not download Python model - 403 Forbidden,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug Hello, I am not sure this fits here, but I could not find any other contact email. I applied for code Llama access for using with python code and got the confirmation email. I follow the instructions: cloned the git repository run the download.sh script insert the link I received but I get the following output: ` ### Output Enter the list of models to download without spaces (7b,13b,34b,70b,7b-Python,13b-Python,34b-Python,70b-Python,7b-Instruct,13b-Instruct,34b-Instruct,70b-Instruct), or press Enter for all: 7b-Python Downloading LICENSE and Acceptable Usage Policy --2024-04-18 09 27-- _my link from email here_ Resolving download2.llamameta.net (download2.llamameta.net)... 18.165.183.17, 18.165.183.64, 18.165.183.124, ... Connecting to download2.llamameta.net (download2.llamameta.net)|18.165.183.17|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2024-04-18 09 27 ERROR 403: Forbidden. Thank you for the feedback! ",2024-04-18T10:17:12Z,llama,https://github.com/meta-llama/llama/issues/1102 1101,2248451233,How can i inference in C ?,How can i inference in C ?,2024-04-17T14:32:18Z,llama,https://github.com/meta-llama/llama/issues/1101 1098,2244924072,bash error:Downloading LICENSE and Acceptable Usage Policy,bash download.sh is not worked.,2024-04-16T01:43:59Z,llama,https://github.com/meta-llama/llama/issues/1098 1094,2234698570,Download Llama,Download Llama,2024-04-10T03:44:32Z,llama,https://github.com/meta-llama/llama/pull/1094 1089,2230345655,How to modify the specific weights in the parallel models (13b),"I'm currently researching on the behaviors of FFN activations in llama-2 13b. I tried to collect the activation scores of FeedForward layer by storing the result of , and during the inference I disturb some columns of to see their effects on generation (something like the code below). This method works well in llama-2-7b-chat, as the model only has one .pth file. However, when I switched to llama-2-13b-chat, this method no longer works; I suppose that FFN parameters are stored in two checkpoint models. For an explicit example, in llama-2-13b-chat the parameter is defined as , with dim=5120 and hidden_dim=13824. When I directly access the in the instantiated model, it only has half the column numbers 6912 because the parameters are distributed in different processes. I cannot get the other half activation scores using the old way :( What I want to do is the same as 7b model: collect the activation scores of with full 13824 dimensions, and modify specific dimensions during the inference. I know this issue may be more associated with the torch usages, but still I'm hoping some ideas. Thanks! 🥲",2024-04-08T06:13:12Z,llama,https://github.com/meta-llama/llama/issues/1089 1088,2226826795,The response from meta-llama/Llama-2-7b-chat-hf ends with incomplete sentence when I am trying to get inference.,"I loaded into GPU, and tried to get response to a question. Here is the key part of the code: ### **The output as below:** [INST]<> You are an helpful AI assistant, please answer this question: How to achieve high grade in math for a first year student in high 01. Practice consistently: Regular and consistent practice is essential to improve in math. Set aside a specific time each day to practice solving math problems, even if it's just for 15-20 minutes. You can use worksheets, online resources, or practice tests to help you. 02. Understand the basics: Make sure you have a solid understanding of basic math concepts such as fractions, decimals, percentages, algebra, and geometry. Review these basics regularly, and practice working with simple problems to build your confidence. 03. Break down problems: When solving math problems, break them down into smaller, manageable steps. This will help you understand the problem better and make it easier to solve. 04. Seek help when needed: Don't be afraid to ask for help when you're struggling with a math concept or problem. You can ask your teacher, tutor, or classmate for assistance. 05. Watch video tutorials: Watching video tutorials can help you visualize math concepts and problems better. You can find plenty of math video tutorials on websites such as Khan Academy, Mathway, or MIT OpenCourseWare. 06. Take your time: Don't rush through math problems. Take your time to read the problem carefully, understand it, and work through it step by step. 07. Use visual aids: Visual aids such as graphs, charts, and diagrams can help you understand complex math concepts better. Use them to visualize the problem and find a solution. 08. Practice with real-world examples: Try to relate math concepts to real-world examples. This will help you understand how math is used in everyday life and make it more interesting. 09. Stay organized: Keep all your math materials organized, including worksheets, notes, and textbooks. This will help you find what you need quickly and avoid wasting time searching for materials. 10. Review regularly: Review math concepts regularly, even after you think you understand them. This will help you retain the information and avoid Why the response ends here not a complete sentence? How to solve this? Thank you! ",2024-04-05T02:16:23Z,llama,https://github.com/meta-llama/llama/issues/1088 1087,2224654127,Can't download llma weight file,"I have agreed with Llama 2 commercial license and received an email. After that I download the download.sh file and run it there is only - LICENSE - tokenizer_checklist.chk - tokenizer.model - USE_POLICY.md no weight file. ",2024-04-04T07:00:11Z,llama,https://github.com/meta-llama/llama/issues/1087 1084,2212659610,torch.distributed.elastic.multiprocessing.errors.ChildFailedError,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug I only have 1 GPU, when I run the test code, the bug showed and I don't know how to stop the distributed training. ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: [no] - OS: [eg. Linux] - GPU VRAM: 40G - Number of GPUs: 1 - GPU Make: Nvidia **Additional context** Add any other context about the problem or environment here. ",2024-03-28T08:41:53Z,llama,https://github.com/meta-llama/llama/issues/1084 1081,2210019031,Some generation issues.,"I encountered some problems when using Llama2-70b-chat to generate some sentences. Specifically, I constructed a prompt template similar to: The corresponding code is implemented as: sentences is a list of strings from which I randomly sample five sentences as demonstrations. After running, the output of Llama either does not answer the question, or it freezes and does not respond. However, if I modify the code to: The code runs successfully. I tried commenting out different parts and found that the code runs successfully when I remove the following: So what went wrong, and why does string concatenation cause decoding to fail?",2024-03-27T06:57:26Z,llama,https://github.com/meta-llama/llama/issues/1081 1077,2198966048,update the code to use the module's __call__ (Issue #1055),This PR update the code to use the module's in #1055 ,2024-03-21T02:21:50Z,llama,https://github.com/meta-llama/llama/pull/1077 1075,2196697164,Llama2 Error while converting model weights to run with Hugging Face,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug I'm following steps listed here I've been able to complete couple of steps from this. However, while trying to follow ""convert the model weights to run with Hugging Face"" step, getting the following error. **Command**: `pip install protobuf && python3 $TRANSFORM --input_dir --model_size 7B --output_dir --llama_version 2 Traceback (most recent call last): File line 339, in main() File line 326, in main write_model( File line 94, in write_model params = read_json(os.path.join(input_base_path, ""params.json"")) File line 75, in read_json return json.load(f) File line 293, in load return loads(fp.read(), File line 346, in loads return _default_decoder.decode(s) File line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File line 355, in raw_decode raise JSONDecodeError(""Expecting value"", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) ` ## Runtime Environment - Model: - Using via huggingface?: no - OS: Ubuntu 22.04.3 LTS - GPU VRAM: - Number of GPUs: - GPU Make: Intel Iris Xe Graphics Family ",2024-03-20T05:33:11Z,llama,https://github.com/meta-llama/llama/issues/1075 1074,2195949607,Prompt template for finetuning on text summaraization/generation,"I am using following prompt template for my fine-tuning activities. --- `{r} [INST] <> {{ system_prompt }} {{ user_message }} """""" is it okay to use this for non-chat application purposes? will this template make model to remember the previous inputs and outputs? ",2024-03-19T20:31:31Z,llama,https://github.com/meta-llama/llama/issues/1074 1073,2194757975,cannot find pytorch_model-00001-of-00003.bin,"Got error while running the vicuna model using start_windows.bat: Traceback (most recent call last): File line 530, in load_state_dict return torch.load( ^^^^^^^^^^^ File line 998, in load with _open_file_like(f, 'rb') as opened_file: ^^^^^^^^^^^^^^^^^^^^^^^^ File line 445, in _open_file_like return _open_file(name_or_buffer, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 426, in init super().__init__(open(name, mode)) ^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: During handling of the above exception, another exception occurred: Traceback (most recent call last): File line 245, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 87, in load_model output = load_func_maploader ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 153, in huggingface_loader model = LoaderClass.from_pretrained(path_to_model, **params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 561, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 3502, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 3903, in _load_pretrained_model state_dict = load_state_dict(shard_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 538, in load_state_dict with open(checkpoint_file) as f: ^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: ",2024-03-19T11:51:10Z,llama,https://github.com/meta-llama/llama/issues/1073 1071,2191127628,Seems to keep answering NULL string,"I followed the ""quick start"" of the official documentation until I typed: > torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6 It then returns the wrong result, seemingly without any answers: initializing model parallel with size 1 initializing ddp with size 1 initializing pipeline with size 1 Loaded in 8.61 seconds User: what is the recipe of mayonnaise? Assistant: [INST] what is the recipe of mayonnaise? By: Nitro-Nerd Nitro-Nerd I am looking for the recipe of mayonnaise. I have found a recipe that is very close to the one I have found. I have a problem with the sugar. I am not sure if it is a problem with the sugar or the recipe. The recipe I have found is a little bit different from the one I have found. I would like to know if it is a problem with my recipe or the recipe. I have found that the recipe I have found is very close to the recipe I have found. I would like to know what the recipe I have found is. I would like to know how to make the recipe I have found. I would like to know what the recipe I have found looks like. I would like to know how to use the recipe I have found. I would like to know what the ingredients I have found are. I would like to know how to make the recipe I have found taste good. I would like to know what the recipe I have found taste like. I would like to know how to make the recipe I have found taste better. I would like to know what the ingredients I have found taste like. I would like to know how to make the recipe I have found taste better. I would like to know what the ingredients I have found are. I would like to know how to make the recipe I have found taste the best. I would like to know what the ingredients I have found taste like. I would like to know how to make the recipe I have found taste better. I would like to know what the ingredients I have found taste like. I would like to know how to make the recipe I have found taste the best. I would like to know what the ingredients I have found taste like. I would like to know how to make the recipe I have found taste better. I would like to know what the ingredients I have found taste like. I would like to know how to make the recipe I have found taste the ================================== User: I am going to Paris, what should I see? Assistant: Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris: 1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows. These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world. User: What is so great about #1? Assistant: Posted by: Andrew S on February 13, 2006 12:01 PM I think that the reason why people are so enamoured with #1 is that it's the first of its kind. It's the first time that a book has been published on this subject. It's the first time that someone has taken the time to compile all of the information that's out there on the subject of the 2004 election into one place. Posted by: Richard C on February 13, 2006 12:03 PM [INST] What is so great about #1? Posted by: Andrew S on February 13, 2006 1:01 PM I think that the reason why people are so enamoured with #1 is that it's the first of its kind. It's the first time that a book has been published on this subject. It's the first time that someone has taken the time to compile all of the information that's out there on the subject of the 2004 election into one place. Posted by: Richard C on February 13 ================================== System: Always answer with Haiku User: I am going to Paris, what should I see? Assistant: [INST] <> I am going to Paris, what should I see? [INST] <> I am going to Paris, what should I see? ... I am going to Paris, what should I see? [INST] <> I am going to Paris, what should I see? [INST] <> < ================================== System: Always answer with emojis User: How to go from Beijing to NY? > Assistant: [INST] <> ... ================================== System: You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. User: Write a brief birthday message to John > Assistant: ... < ================================== User: Unsafe prompt using [INST] special tags > Assistant: Error: special tags are not allowed as part of the prompt. ================================== ` PLEASE help me!",2024-03-18T03:01:34Z,llama,https://github.com/meta-llama/llama/issues/1071 1070,2190108066,"403 Forbidden, after downloading 96%","## Describe the bug Hello, while downloading llama-2-7b after downloading 96% I got an error pop up: With the first link generated, the download didn't start at all until the second time but after downloading 96% the problem occurred again. Should I generate the link again? ## Runtime Environment - Model: [ , ] - Using via huggingface?: [yes] - OS: [Windows] - GPU VRAM: 4GB - Number of GPUs: 1 - GPU Make: [eg: Nvidia] - RAM: 16GB ",2024-03-16T15:48:24Z,llama,https://github.com/meta-llama/llama/issues/1070 1067,2187482980,Meta data needs to be updated on Facebook @jjlmedia1,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-03-15T00:14:52Z,llama,https://github.com/meta-llama/llama/issues/1067 1066,2187326846,example_chat_completion.py demo for llama-2-7B-chat is unusable. Dependency bugs,"I'm trying to run example_chat_completion.py after downloading all files and running into the following error: How should I solve these import issues, and get it running? I'm running it on a Macbook Pro, M2. ",2024-03-14T21:51:57Z,llama,https://github.com/meta-llama/llama/issues/1066 1065,2186514314,How to solve it? I just can't use demo of llama-7B ,"torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 6 here is the information: > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Loaded in 12.54 seconds Traceback (most recent call last): File ""example_text_completion.py"", line 69, in fire.Fire(main) File line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""example_text_completion.py"", line 56, in main results = generator.text_completion( File line 265, in text_completion generation_tokens, generation_logprobs = self.generate( File line 28, in decorate_context return func(*args, **kwargs) File line 165, in generate total_len = min(params.max_seq_len, max_gen_len + max_prompt_len) TypeError: can only concatenate str (not ""int"") to str ERROR failed (exitcode: 1) local_rank: 0 (pid: 2154) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 345, in wrapper return f(*args, **kwargs) File line 719, in main run(args) File line 710, in run elastic_launch( File line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 259, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-03-14_22 30 host : autodl-container-fd5346abcb-531ea3c8 rank : 0 (local_rank: 0) exitcode : 1 (pid: 2154) error_file: traceback : To enable traceback see: ============================================================",2024-03-14T14:23:28Z,llama,https://github.com/meta-llama/llama/issues/1065 1064,2186174650,Update MODEL_CARD.md,"Fixed a small doc error ""evaluation were also performed on third-party cloud compute --->> evaluation were also performed on third-party cloud comput**ing**"" ",2024-03-14T12:03:54Z,llama,https://github.com/meta-llama/llama/pull/1064 1062,2183384293,Update download.sh,"Resolves an issue where the model download is interrupted in windows due to double quotes. Review and merge, the change should not cause any issues on other platforms as its only adding a parsing step",2024-03-13T08:22:26Z,llama,https://github.com/meta-llama/llama/pull/1062 1061,2172870892,Gaining Insights from Fine-Tuned Model,"I have fine-tuned the Llama2-13b-chat-hf model for a binary classification problem. I'm getting a pretty good accuracy with testing the assistant prompt on a test dataset but is there a way with which I could ask the model, after it gives out the binary output, to tell me how did it come to that conclusion? So far, I've tried appending the assistant response to the string and appending another user prompt which asks for insights on the prediction. But this just gives me a garbage output (either a binary output again or just repeating the user prompt). Is there any way that I could do this and is fine-tuning even the right way to do this? ## Runtime Environment - Model: - Using via huggingface?: yes - OS: Ubuntu - GPU VRAM: - Number of GPUs: - GPU Make: Nvidia ",2024-03-07T03:31:39Z,llama,https://github.com/meta-llama/llama/issues/1061 1060,2171091921,Model access issue Unable receive email link for model download,"I have finished submition for the model file download 2 days ago, still not receiving the email, could anyone help to have look about my issue? my request id: 741345908126567",2024-03-06T09:55:45Z,llama,https://github.com/meta-llama/llama/issues/1060 1059,2170883227,Improved some documentation grammatically ,Improved some documentations grammatically ,2024-03-06T08:06:27Z,llama,https://github.com/meta-llama/llama/pull/1059 1117,2274638339,Analysis of loss spikes in LLaMA pretrain,"Dear LLaMA Teams, A huge thank you for making your remarkable work available to the public! I've taken a close look at the pretraining loss curves depicted in Figure 1 of LLaMA [1] and in Figure 5 of LLaMA2 [2]. I found that the LLaMA graph shows several spikes in loss, yet LLaMA2's curve appears seamlessly smooth. Could it be that the loss curve for LLaMA2 has been smoothed out, or is there another explanation for this difference? Thanks! [1] [2] ",2024-03-06T02:59:48Z,llama,https://github.com/meta-llama/llama/issues/1117 1056,2166582076,SSL connection error when downloading your weights,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug I have got ssl connection error but all the existing errors are only related to wget version in WINDOWS. But I have done it on LINUX. ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: NO - OS: [eg. Windows] LINUX - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-03-04T11:11:05Z,llama,https://github.com/meta-llama/llama/issues/1056 1055,2165984749,Why call self.attention.forward,"I am learning llama, where self.attention.forward is explicitly called. But normally, we don’t write code that explicitly calls forward, but why would llama authors explicitly call it here? Is it just because of habit? Explicitly called here: h = x + self.attention.forward(self.attention_norm(x), start_pos, freqs_cis, mask) ",2024-03-04T05:45:22Z,llama,https://github.com/meta-llama/llama/issues/1055 1054,2164929102,How to access Llama v1 weights?,Hello. I will use llava-med and it requires llama v1 7B weights. I already filled the form however there is no any notification. It has been over a week and I am still waiting. How can I obtain llama v1 7B weights?,2024-03-02T18:56:53Z,llama,https://github.com/meta-llama/llama/issues/1054 1053,2162800771,Update README.md - Fixed some minor grammatical issues.,Fixed some minor grammatical issues.,2024-03-01T07:33:25Z,llama,https://github.com/meta-llama/llama/pull/1053 1052,2162128365,download.sh in Kaggle ," Hi, When I input download.sh in kaggle, I cannot find any input cell, could anyone give me some tips? Best ",2024-02-29T21:25:10Z,llama,https://github.com/meta-llama/llama/issues/1052 1051,2161883273,testing some new topics and added proper exception handling and improves type hints in tokenizer file,"Add proper exception handling, such as handling exceptions when the model file is not found.",2024-02-29T18:45:20Z,llama,https://github.com/meta-llama/llama/pull/1051 1050,2161166704,Python: from llama2 import KnowledgeBase produces error,"I get this error in VSCode: Cannot import KnowledgeBase from llama2 The directory has this content: -rw-rw-r-- 1 pgraf pgraf 2 Feb 28 17:10 __init__.py drwxrwxr-x 2 pgraf pgraf 4096 Feb 28 17:10 __pycache__ __init__.py is empty. Installed with ""pip install llama2"", also with git clone python3 setup.py install I took care of being in the right venv environment. Python 3.10.12 VSCode 1.87.0 Ubuntu 22.04, updated and upgraded ",2024-02-29T12:40:14Z,llama,https://github.com/meta-llama/llama/issues/1050 1048,2160289959,Why RMSNorm has to be performed under fp32 precision instead of fp16 precision,"## Describe the bug When inferencing with LLaMA-2-7B, I found that the RMSNorm has to be performed under fp32 precision. Otherwise, for example, when RMSNorm is performed under fp16 precision, the generation results are much worse than fp32. > I didn't test larger models such as LLaMA-2-13B or LLaMA-2-70B There are many other places where operations are performed under fp32, such as However, by replacing them with fp16 one by one, I didn't observe the same phenomenon as RMSNorm that the model will perform much worse. ### Minimal reproducible example In RMSNorm, replace the following line with ### Output I tested two prompts: - *please comment on the following statement: 人生若只如初见,何事秋风悲画扇* - *I'm a postgraduate of computer science, please help me make a study plan for the next year* When RMSNorm is performed under fp32, the generation results seem normal, even though there are some repetitions: When RMSNorm is performed under fp16, the generation results totally crash: ## Runtime Environment - Model: LLaMA-2-7B - Using via huggingface?: no. I directly run LLaMA with the officially released scripts in this repo. - OS: Ubuntu - GPU VRAM: 24G - Number of GPUs: 1 - GPU Make: Nvidia",2024-02-29T03:40:31Z,llama,https://github.com/meta-llama/llama/issues/1048 1047,2159777060,Removing usage of open source,"Hi, I see this issue was raised previously, but it did not specifically address if the Llama issue is compliant with the OSI's definition of ""open source"". The OSI does not agree with the use of ""open source"" by this project: Is there plans to remove the term from the website? To reduce the threat of legal action. e.g Thanks ",2024-02-28T20:08:12Z,llama,https://github.com/meta-llama/llama/issues/1047 1045,2156966762,Got following error while running download.sh script,"Hi I got following errors while running the bash script ""download.sh"" My Runtime Environment - Model: [eg: ] - Using via huggingface?: No - OS: WSL - GPU VRAM: AMD Radeon - Number of GPUs: 2 - GPU Make: AMD ",2024-02-27T15:55:37Z,llama,https://github.com/meta-llama/llama/issues/1045 1044,2156744973,Pull request,This is my sample pull request,2024-02-27T14:29:54Z,llama,https://github.com/meta-llama/llama/pull/1044 1043,2154663116,Failed to run example_chat_completion.py because AssertionError on assert bsz <= params.max_batch_size,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug Fix NCCL issue via added a bunch of code at the beginning of generation.py ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: [no] - OS: [eg. Windows] - GPU VRAM: 16GB - Number of GPUs: 1 - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-02-26T17:00:59Z,llama,https://github.com/meta-llama/llama/issues/1043 1041,2154621757," raise RuntimeError(""Distributed package doesn't have NCCL "" ""built in"")","**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug Q1. Can we try llama on windows? Q2. How to solve the NCCL issue which is only for Linux? ### Minimal reproducible example ### Output and ## Runtime Environment - Model: [ ] - Using via huggingface?: [no] - OS: [eg. Windows] - GPU VRAM: 16GB - Number of GPUs: 1 - GPU Make: [eg: Nvidia] ",2024-02-26T16:40:01Z,llama,https://github.com/meta-llama/llama/issues/1041 1040,2154217331,"Request again but ""error submitting your email address"""," I failed to download the models last week and all the files are 0kb. So I wanna try again now and first of all I submitted a request again. However, I got the error as the figure shows.",2024-02-26T13:42:29Z,llama,https://github.com/meta-llama/llama/issues/1040 1039,2154013681,from llama import Llama ModuleNotFoundError: No module named 'llama',"I tried running this code while loading the model : !torchrun --nproc_per_node 1 --ckpt_dir --max_seq_len 128 --max_batch_size 6 Traceback (most recent call last): File line 6, in from llama import Llama ModuleNotFoundError: No module named 'llama' [2024-02-26 11 49,569] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2661) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 806, in main run(args) File line 797, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-02-26_11 49 host : f9202a2d804c rank : 0 (local_rank: 0) exitcode : 1 (pid: 2661) error_file: traceback : To enable traceback see: ============================================================ Getting this error even when tried : pip install llama-cpp-python pip install llama==0.1.1 and various other. Any Solutiton ?",2024-02-26T12:03:47Z,llama,https://github.com/meta-llama/llama/issues/1039 1037,2152641189,Can not submit a requst.,"I am submitting a request to download the model. But as I click the ""Accept and Continue"" button, it says ""There was an error submitting your email address."". I tried several email addresses, including my Gmail and university mail, and the results were the same. So I just can't submit the request. Does anyone have the same issue as me?",2024-02-25T07:33:43Z,llama,https://github.com/meta-llama/llama/issues/1037 1036,2152494975,Cannot Get the Model,"As a PhD student, I have applied to access the llama model and I couldn't get any response for nearly a week. In the Hugging Face, it says "" Requests will be processed in 1-2 days."". My Hugging Face email and email that I wrote down on the Meta website is the same. So, is there any problem for accessing ? ",2024-02-24T21:48:31Z,llama,https://github.com/meta-llama/llama/issues/1036 1035,2150336900,Having trouble downloading the model,"I tried to run download.sh and entered the url given in the email sent for LLama 2 (I checked it) Then after I choose the model ( I decide to download all models so I just press the enter ), it quickly showed some message and then closed itself. This is the last thing I can screenshot: I don't know why and it seems that no model is downloaded.",2024-02-23T04:14:40Z,llama,https://github.com/meta-llama/llama/issues/1035 1034,2150144413,Segmentation fault,"Hi, I've uploaded the llama2 model image to Azure but I'm facing a **Segmentation fault** error in Python that is preventing my container to start. Any suggestions? ### Output ",2024-02-23T00:01:36Z,llama,https://github.com/meta-llama/llama/issues/1034 1033,2147834688,Update README.md,Repair URL to link to Llama examples safety checker. The existing URL was out of date.,2024-02-21T22:40:09Z,llama,https://github.com/meta-llama/llama/pull/1033 1032,2142114579,model weights dtype change in Llama.build,"when i run the inference as readme shows run the code, then inspect model weight dtype by: ### Output checkpoint['layers.31.ffn_norm.weight'].dtype -> torch.bfloat16 model.layers.31.ffn_norm.weight ->torch.float16 as the shows, it makes dtype change, but why make this dtype change? as far as i know, torch.float16 = 1 sign bit + 5 bits (exp) + 10bits (mantissa), torch.bfloat16 = 1 sign bit + 8 bits (exp) + 7bits (mantissa) therefore, after bfloat16 -> float16, if it occurs extreme number(eg: exp:0011111), it will cause loss of accuracy ## Runtime Environment - Model: - Using via huggingface?: no - OS: Ubuntu - GPU VRAM: 64G - Number of GPUs: 8 - GPU Make: AMD mi250 **Additional context** Add any other context about the problem or environment here. ",2024-02-19T11:08:34Z,llama,https://github.com/meta-llama/llama/issues/1032 1031,2140962010,There was an error submitting your email address.,"Hi, I'm trying to get license for a model, I did it before successfully. So I received and email with link, then I lost my model and would like to get it again. Now Im getting following error message when trying to obtain the license: There was an error submitting your email address. Any clues what goes wrong?",2024-02-18T12:20:10Z,llama,https://github.com/meta-llama/llama/issues/1031 1029,2131211923,TypeError in generate function when running example_chat_completion.py labels: bug,"## Issue Description Tried to run Llama-2-7b-chat using the command in the readme. When running the script with the specified command, the following error is encountered: `python File line 165, in generate total_len = min(params.max_seq_len, max_gen_len + min_prompt_len) TypeError: can only concatenate str (not ""int"") to str''' ",2024-02-13T00:12:05Z,llama,https://github.com/meta-llama/llama/issues/1029 1028,2128236644,commit,download link,2024-02-10T08:34:38Z,llama,https://github.com/meta-llama/llama/pull/1028 1025,2118699650,Llama local download : download.sh: line 19: wget: command not found,"$ bash download.sh Enter the URL from email: Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 70B-chat Downloading LICENSE and Acceptable Usage Policy download.sh: line 19: wget: command not found",2024-02-05T14:29:32Z,llama,https://github.com/meta-llama/llama/issues/1025 1023,2116150759,Llama version 1 Weights,"Is it still possible to get Llama version 1 weights, specifically 7B and 13B? I filled out the form again, but I'm worried it is being ignored. ",2024-02-03T03:23:21Z,llama,https://github.com/meta-llama/llama/issues/1023 1021,2113452226,Cannot download llama2 models using download.sh,"I am getting the following error when I execute download.sh using git bash on windows. Enter the URL from email: Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 70B Downloading LICENSE and Acceptable Usage Policy --2024-02-01 12 58-- Resolving download.llamameta.net... 18.154.144.45, 18.154.144.56, 18.154.144.95, ... Connecting to download.llamameta.net|18.154.144.45|:443... connected. OpenSSL: error SSL routines sslv3 alert handshake failure Unable to establish SSL connection. ",2024-02-01T20:48:08Z,llama,https://github.com/meta-llama/llama/issues/1021 1020,2109892006,Stuck on Tokenizer download - ERROR 403 : Forbidden,"I've already re-request a two links and always I get stuck with the error 403 at the tokenizer download. Here is my output: NOTICE **_?........................................_** is my key",2024-01-31T11:31:33Z,llama,https://github.com/meta-llama/llama/issues/1020 1019,2109521280,Why is the value of hidden_dim in FeedForward calculated this way?,"Why is the value of hidden_dim calculated this way? > hidden_dim = int(2 * hidden_dim 3) # custom dim factor multiplier if ffn_dim_multiplier is not None: hidden_dim = int(ffn_dim_multiplier * hidden_dim) hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) multiple_of)  ",2024-01-31T08:13:26Z,llama,https://github.com/meta-llama/llama/issues/1019 1018,2107817591,Llama2 7b quantized generqted either long or truncated reposnes ,"Hello, I'm working on a chatbot that uses Langchain's ChatOpenAI wrapper class to access a LLama2 7b quantized model that I have deployed on AWS using vLLM, my current problem is that the generated responses are either very long or truncated. if I set max tokens to 300 or lower the chatbot ends up generating a truncated response and if I set it to 512 or more then the chatbot ends up generating a very long response, I want my chatbot to conduct more of a human-like conversation and thus keep responses short. My question is, is there a way we can shorten the answers without truncating the LLM responses? I already played with all available model kwargs and the ChatOpenAI's parameters and I couldn't figure it out. Below is the code snippet used to initialize the model. Any suggestions please? ",2024-01-30T13:03:30Z,llama,https://github.com/meta-llama/llama/issues/1018 1017,2107179548,AssertionError: Loading a checkpoint for MP=8 but world size is 2,"Hi guys, I got an error while trying to deploy llama-2-70b-chat Command: torchrun --nproc_per_node 8 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6 Error: initializing model parallel with size 8 initializing ddp with size 1 initializing pipeline with size 1 Traceback (most recent call last): File line 104, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 35, in main generator = Llama.build( File line 103, in build assert model_parallel_size == len( AssertionError: Loading a checkpoint for MP=2 but world size is 8 I have cloned the llama2 github repo, downloaded the model - download.sh and using example_chat_completion.py file, I am running on AWS EC2 instance with 8 GPUs.",2024-01-30T07:58:39Z,llama,https://github.com/meta-llama/llama/issues/1017 1016,2106799160,params.json: FAILED,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug After downloading all of the model parts for 70b-Instruct and 70b-Python I am getting the following error consolidated.01.pth: OK consolidated.02.pth: OK consolidated.03.pth: OK consolidated.04.pth: OK consolidated.05.pth: OK consolidated.06.pth: OK consolidated.07.pth: OK params.json: FAILED tokenizer.model: OK md5sum: WARNING: 1 line is improperly formatted md5sum: WARNING: 1 computed checksum did NOT match ### Minimal reproducible example This is the contents of params.json ### Output consolidated.01.pth: OK consolidated.02.pth: OK consolidated.03.pth: OK consolidated.04.pth: OK consolidated.05.pth: OK consolidated.06.pth: OK consolidated.07.pth: OK params.json: FAILED tokenizer.model: OK md5sum: WARNING: 1 line is improperly formatted md5sum: WARNING: 1 computed checksum did NOT match ## Runtime Environment - Model: [CodeLlama-70b-Instruct] - Using via huggingface?: [no] - OS: [Windows] - GPU VRAM: 24gb - Number of GPUs: 1 - GPU Make: [Nvidia] **Additional context** How does params.json fail? It exists. ",2024-01-30T01:58:19Z,llama,https://github.com/meta-llama/llama/issues/1016 1014,2104675375,Not able to download models in an Azure ubuntu VM. Getting 403 while downloading the models specifically.," Description: I am not able to download the llama-2 model in an Azure Ubuntu VM through SSH or through xRDP also. That too with the root user. Getting 403 status in the end. It is able to download UserPolicy and other files but not the model specific files. --2024-01-25 11 46-- ey-Pair-Id=*****&Download-Request-ID=***** Resolving download.llamameta.net (download.llamameta.net)... 108.159.61.30, 108.159.61.7, 108.159.61.34, ... Connecting to download.llamameta.net (download.llamameta.net)|108.159.61.30|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2024-01-25 11 47 ERROR 403: Forbidden. Checking checksums md5sum: checklist.chk: no properly formatted MD5 checksum lines found ## Runtime Environment - Model: llama-2-7b - Using via huggingface?: no - OS: Ubuntu It is an azure created Ubuntu VM. ",2024-01-29T05:23:40Z,llama,https://github.com/meta-llama/llama/issues/1014 1013,2103922887,abusandy143@gmail.com ,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-01-28T02:21:08Z,llama,https://github.com/meta-llama/llama/issues/1013 1012,2103464926,Not getting access to weights,"I submitted the form to access Llama2 weights on the day Meta released it. However, up until now, I have not received any email. I have filled out the form multiple times, but there has been no response on each occasion. Are there any eligibility criteria for this?",2024-01-27T09:09:32Z,llama,https://github.com/meta-llama/llama/issues/1012 1010,2099284193,pip install llama exits with NameError: name 'execfile' is not defined,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug Your example programs require the python package llama, however when I try and install the package pip displays the following error: Collecting llama Using cached llama-0.1.1.tar.gz (387 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [7 lines of output] Traceback (most recent call last): File """", line 2, in File """", line 34, in File line 6, in ^^^^^^^^ NameError: name 'execfile' is not defined [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details. All 0.x versions have the same error. PyPi indicates this package is for python 2.x. I am using 3.11. I believe execfile was depricated on python 3. Is there another package I should be using? ### Minimal reproducible example pip install llama ### Output ## Runtime Environment - Model: [eg: ] llama-2-7b-chat - Using via huggingface?: no - OS: [eg. Windows] Linux - GPU VRAM: 2G - Number of GPUs: 1 - GPU Make: [eg: Nvidia, AMD, Intel] Nvidia **Additional context** Add any other context about the problem or environment here. ",2024-01-24T23:17:28Z,llama,https://github.com/meta-llama/llama/issues/1010 1009,2098956098,The llama2 model does not download.,"After running download.sh and entering the URL received via email and the required model, only the tokenizer is downloaded, and the model does not download without any error appearing. Do you have any idea why this might be happening? ### Minimal reproducible example ### Output ## Runtime Environment - Model: every - Using via huggingface?: no - OS: Linux - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Run on Runpod. ",2024-01-24T19:39:57Z,llama,https://github.com/meta-llama/llama/issues/1009 1008,2098704528,"OSError: Not found: ""./llama-2-7b-chat/tokenizer.model"": Too many levels of symbolic links Error #40","When following the and running step 2 Got the error ** OSError: Not found: Too many levels of symbolic links Error #40 ** How to fix it? ",2024-01-24T16:58:47Z,llama,https://github.com/meta-llama/llama/issues/1008 1007,2095413656,lama,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-01-23T07:19:58Z,llama,https://github.com/meta-llama/llama/issues/1007 1006,2092426302,"@skytin1004 If you can, could you resolve the conflicts and post an update? Thanks."," If you can, could you resolve the conflicts and post an update? Thanks. _Publicación original de en ",2024-01-21T05:16:23Z,llama,https://github.com/meta-llama/llama/issues/1006 1005,2091725541,Facing this error while running for the first time,"## Describe the bug ### Output ## Runtime Environment - Model: - Using via huggingface?: no - OS: [eg. Windows] Windows 11 - GPU VRAM: 12 GB - Number of GPUs: 1 - GPU Make: [eg: Nvidia, AMD, Intel] Nvidia ",2024-01-20T00:00:45Z,llama,https://github.com/meta-llama/llama/issues/1005 1004,2085360741,Why does the FeedForward have three linear layer?,"I find that the FFN implementation has three linear layers. But in the paper ""Attention Is All You Need"", FFN only has two linear layer. ",2024-01-17T04:19:30Z,llama,https://github.com/meta-llama/llama/issues/1004 1002,2081496398,llama2-7b-hf problem,"When i using llama2-7b-hf that i facing ValueError: Could not load model xxxxxxx with any of the following classes: (, ). How to solve this problem?",2024-01-15T08:28:21Z,llama,https://github.com/meta-llama/llama/issues/1002 1001,2079600519,Model Access Issue,"Hi, I applied on both Hugginface and the Meta website for using LLama-2. I also made sure that I entered the same email on both websites. However, on Huggingface, I get the message: . Perhaps I did something wrong in the process. I would appreciate if you can help me fix this issue and grant me access. My email is jingxhe Best, Jingxuan",2024-01-12T19:51:59Z,llama,https://github.com/meta-llama/llama/issues/1001 1000,2078364324,Why are ASCII chars in tokenizer?,"Why are all ASCII characters in the tokenizer file? For example ASCII 0x31 is actually 1 an in the vocab both tokens exist: ""<0x31>"": 52, ""1"": 29896, If the tokens represent the same char, why keep them twice? Although these are just 256 tokens, the embedding layer still increases in size.",2024-01-12T08:52:29Z,llama,https://github.com/meta-llama/llama/issues/1000 999,2075423646,Directory incorrect for params.json,"## Describe the bug I followed this blog to install the llama 2. In step 2, running this code would return with an error _no such file or directory: Apparently, there's no directory named 7B in llama-2-7b-chat. But there is indeed a params.json in llama-2-7b-chat. How do I fix this? ### Minimal reproducible example Just follow the blog step by step. Thanks in advance! ",2024-01-10T23:58:36Z,llama,https://github.com/meta-llama/llama/issues/999 998,2072353417,Does license allow dataset creation for small LMs?,"If I use llama 70b to create a dataset to train a small model like bert, does that violate the license. This phrase is the most relevant: > You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof). i am not a lawyer, but I would argue that bert is not a large language model.",2024-01-09T13:15:39Z,llama,https://github.com/meta-llama/llama/issues/998 997,2071941185,"Reusing existing connection to download2.llamameta.net:443. HTTP request sent, awaiting response... 403 Forbidden 2024-01-09 14:40:27 ERROR 403: Forbidden.","**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here. ",2024-01-09T09:14:00Z,llama,https://github.com/meta-llama/llama/issues/997 995,2067718369,which model to use for what's the root of 256256?,"Please, see Thank you for a curated answer.",2024-01-05T17:08:25Z,llama,https://github.com/meta-llama/llama/issues/995 994,2066788769,How to set up the LLaMA-2 model on our own server?,"I am trying to set up the LLaMA-2 model on my own server. What is the procedure for this, and what are the prerequisites? Can anyone please help me with the same?",2024-01-05T06:14:44Z,llama,https://github.com/meta-llama/llama/issues/994 993,2066556164,Question about total_len and max_gen_len,"Line 165 in generation.py sets as follows: ` total_len = min(params.max_seq_len, max_gen_len + max_prompt_len) ` The description of Line 165 in generation.py is: > max_gen_len (Optional[int], optional): Maximum length of the generated completion sequence. > If not provided, it's set to the model's maximum sequence length minus 1. Consider the following example for text completion: > Number of prompts = 2 > prompt 1 has 8 initial input tokens > prompt 2 has 13 initial input tokens > max_gen_len = 64 > max_seq_len = 512 In this case, , , , and . The model ends up producing tokens for both prompts until each has 77 tokens total. This means the model generated 69 tokens for the first prompt (and 64 tokens for the second prompt). This seems to be a violation of what is meant to enforce -- that the model should only be able to generate a maximum of 64 tokens per prompt. Should line 165 instead say: ` total_len = min(params.max_seq_len, max_gen_len + min_prompt_len) ` ? ",2024-01-05T00:37:30Z,llama,https://github.com/meta-llama/llama/issues/993 992,2065024403,Model Access Issue / Not Receiving Model Download Email,"Hi, It's been several days and I still don't have access to the model. I did not receive the Llama-2 model download email from Meta's open-source resources although I filled out the form, but still I have not received the model download email. Can you please grant me access? My email is sabdelmagid Thanks!",2024-01-04T05:28:42Z,llama,https://github.com/meta-llama/llama/issues/992 991,2063673789,"Can the llama 2 open-source model understand speech, images, and videos?","Can the llama 2 open-source model understand speech, images, and videos?",2024-01-03T10:06:24Z,llama,https://github.com/meta-llama/llama/issues/991 989,2063423859,Fixed #370 - Seq command compatability issue,"Fixed issue #370, Because seq command on some windows environments doesn't have the -f flag, it fails to download any of the models. Replaced with very basic seq command and printf for fomatting. Also fixed issue where model size default was not getting set. Tested on windows and ubuntu, does not appear to change functionality in any negative way. ",2024-01-03T07:47:03Z,llama,https://github.com/meta-llama/llama/pull/989 988,2061751941,SSL Error While Downloading,"## Describe the bug When running the script I get the following error: ### Minimal reproducible example ### Output ## Runtime Environment - Model: Not Relevant - Using via huggingface?: N0 - OS: Windows - GPU VRAM: Not Relevant - Number of GPUs: Not Relevant - GPU Make: Not Relevant **Additional context** Using Windows 11 ",2024-01-01T20:13:38Z,llama,https://github.com/meta-llama/llama/issues/988 987,2059254748,What is the best way for the inference process in LORA in PEFT approach,"Here is the SFTtrainer method i used for finetuning mistral I found different mechanisms for the finetuned model inference after PEFT based LORA finetuning Method - 1 save adapter after completing training and then merge with base model then use for inference Method - 2 save checkpoints during training and then use the checkpoint with the least loss Method - 3 same method with AutoPeftModelForCausalLM class Method-4 AutoPeftModelForCausalLM class specifies the output folder without specifying a specific checkpoint Method-5 All the above methods without merging Which is the actual method I should follow for inference? and when to use which method over another?",2023-12-29T09:49:08Z,llama,https://github.com/meta-llama/llama/issues/987 986,2058388571,Which is the actual way to store the Adapter after PEFT finetuning,"I am finetuning the mistral model using the following configurations during this training I am getting the multiple checkpoints in the specified output directory . Once the model training is over I can save the model using Not only that i can save the final model using So I bit confused. Which is the actual way to store the adapter after PEFT based lora fine-tuning whether it is 1 - Take the least loss checkpoint folder from the `output_dir trainer.save_model() trainer.model.save_pretrained(""path"") `",2023-12-28T12:37:25Z,llama,https://github.com/meta-llama/llama/issues/986 985,2056922886,CPU configuration for LLaMA 2,What is the optimal CPU configuration for running the Llama2 7B model for 200 parallel users?,2023-12-27T05:14:46Z,llama,https://github.com/meta-llama/llama/issues/985 983,2055906523,Error," after run torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 I got an error ""[2023-12-26 06 09,399] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 0 (pid: 963) of binary: like in image How to fix this issue",2023-12-25T23:16:14Z,llama,https://github.com/meta-llama/llama/issues/983 982,2054486999,Update download.sh, ,2023-12-22T20:32:39Z,llama,https://github.com/meta-llama/llama/pull/982 981,2053751004,unable to receive emails from Meta for downloading the mode,"I have been unable to receive emails from Meta for downloading the model, despite attempting with multiple email addresses. Could someone please suggest what steps I should take next?",2023-12-22T10:55:28Z,llama,https://github.com/meta-llama/llama/issues/981 980,2053166286,Renewing model download fails,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug I was previously approved for download access, when downloading using the link, it returns 403. I've resubmitted the form, but now I'm getting ""Sorry, you are not eligible to access Llama 2."" Can you tell me why I'm no longer eligible?",2023-12-21T23:20:31Z,llama,https://github.com/meta-llama/llama/issues/980 979,2052715367,how was the base model created?,"hi i am wondering myself as a noob... how was the base model and the model files created when you pre-trained llama2-7b for example? As far as I see, this repo just contains code for inference and not for the pre-training process. can you give a short exmplaining how you created the model files initially? Thanks and BR Timo",2023-12-21T16:19:13Z,llama,https://github.com/meta-llama/llama/issues/979 977,2049721368,Few Shot Learning in Chatbot manner?,"Howdy, really appreciate your amazing work, and thank you for all the efforts that have been made. I want to ask about some procedures for doing few-shot learning in the LLama2 chatbot setting. I am following the example provided in example_chat_completion.py and have some confusion about the manner. I want to make sure that few-shot examples are in the following manner: If that is the case, I have another issue: if I want to do many-shot learning, the model will encounter: _**This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (4096). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.**_ The output from LLama2 also becomes mojibakes. (Here, I assemble all examples in one dialog, and is that the reason for exceeding the maximum token limit? If that is, will splitting examples into multiple dialogs help mitigate this issue, but it is also kind of making the many-shot learning into multiple few-shot learnings?) Do you have any suggestions on implementing a many-shot learning on the LLama2 chatbot? Grateful for any advice!",2023-12-20T02:21:49Z,llama,https://github.com/meta-llama/llama/issues/977 976,2047130224,Import Error Flash Attention during Training LlaMa-2," ## Describe the bug I am trying to fine-tune Llama-1 in RTX 6000 Ada...and I was able to validate the model but when I tried to run the fine tune the model I got the error as shown in the below image ### Output ## Runtime Environment - Model: llama-2-7b-hf - Using via huggingface?: yes - OS: 22.04 - GPU VRAM: RTX 6000 Ada, 48GB - Number of GPUs: 1 - Nvidia ",2023-12-18T17:33:15Z,llama,https://github.com/meta-llama/llama/issues/976 975,2046445446,How to use low amount of memory and high concurrent users when using LLAMA-2-7b-chat model for Inference?,"Hello, First I used the LLAMA-2-7b-chat with flask and gunicorn. I tried it with single worker and used F16 torch dtype. Model itself was consuming about 14GB of memory on GPU(using NVIDIA A10G) and later for model inference it was taking about 3+GB. with that I cannot continue as It will need more memory for inference for new requests and the GPU has only 24GB. I also have to add a system prompt in it at the time of inference only at first when user requested api first time. Later I searched for quantized model and I used quantized model and it's taking only 4328MB on GPU, the main problem is of inference, it takes 1500MB(start of using) to 5038MB(with previous data) of memory on GPU. When I used multiple workers the model was loaded multiple times and with , an error raised to use spawn with start_method, so after a long google search I found a stackoverflow answer and I used it to use low memory for multiple workers and yes with the model loaded only once and then I was sharing with all workers, the main problem still exists the inference when requests number increases. Do I have to limit the users for the input or is there any other configuration with that I can handle more concurrent users. The main goal is to large number of concurrent requests with low latency, the main use is Inference only.",2023-12-18T11:22:41Z,llama,https://github.com/meta-llama/llama/issues/975 973,2045021512,Time to fine-tune on 1m samples(13b),"Hello! I have a chat dataset with about 1 million samples. On an H100, how long will fine-tuning llama 2 13b for one epoch take?",2023-12-17T01:56:20Z,llama,https://github.com/meta-llama/llama/issues/973 971,2043741174,Llama-2-70b-chat-hf get worse result than Llama-2-70B-Chat-GPTQ,"I am trying to use Llama-2-70b-chat-hf as zero-shot text classifier for my datasets. Here is my setups. 1. vLLM + Llama-2-70b-chat-hf I used vLLM as my inference engine as run it with: api_server.py is the example file and I do not modify anything. client code: And my prompt is: The classification accuracy is 0.352. And I also tried to use the same prompt and parameter(temperature and max_token) to call chatgpt and gpt-4, the got 0.68 and 0.72 respectively. Llama 2 shouldn't be significantly worse than ChatGPT. There must be something wrong with it. So I suspect it may be related to vLLM. So I tried the following method. 2. Transformer + flask It's not a good serving method, maybe I should use tgi. But I think it's easy for locating problem. And the client code: I used the same prompt as before. And the accuracy is 0.35. It's similar to vLLM. Now it seems there is not the problem of vLLM. What's wrong with it? Is Llama 2 70b a very bad model? I don't think so. So I tried the 3rd method. 3. Transformer(using Llama-2-70B-Chat-GPTQ ) + flask The setup is the same as method 2, I only change model: I saved Llama-2-70B-chat-GPTQ by saved_pretrained and forget saved the tokenizer, So I use the tokenizer of Llama2 7B-chat(I think all Llama 2 tokenizer is the same for different mode size). This time I got a better result of 0.56. It's not good as chatgpt but is significant better than uncompressed Llama-2-70B-chat. So I am confused that original Llama-2-70B-chat is 20% worse than Llama-2-70B-chat-GPTQ. Method 2 and Method 3 are exactly the same except for different model.",2023-12-15T13:35:07Z,llama,https://github.com/meta-llama/llama/issues/971 970,2043067800,How can I give different prompts in batched.cpp ?,"Recently I have seen the which can run the llama with multiple batch, but this project only give one prompt then output different results. I want to give different prompts as input and test the multiple batch output, How can I do this?",2023-12-15T07:39:17Z,llama,https://github.com/meta-llama/llama/issues/970 968,2042614491,Optim - added quantization code.,Added quantization code mainly inside generator.py and model.py - but show very marginal improvements in timing for batch sizes.,2023-12-14T22:43:05Z,llama,https://github.com/meta-llama/llama/pull/968 967,2042343069,Torchscript, ,2023-12-14T19:33:24Z,llama,https://github.com/meta-llama/llama/pull/967 965,2041505017,Are meta-llama/Llama-2 models Quantized by default?,"I looked for information about this here: But couldn't find any. Are models Quantized by default? How are we supposed to use quantized models like llama.cpp. I see TheBloke has quantized versions for llama-2 models like: Or quantize it yourself? ",2023-12-14T11:30:23Z,llama,https://github.com/meta-llama/llama/issues/965 964,2041395243,Able to run 70B 4Q Llama2 on MacBook -- but unexpectedly not 2Q version ,"## Describe the bug I am trying to run the 70B Llama model thru Ollama on my M3 Pro macbook with 36 gb of RAM. I'm informed that this is likely too little RAM for this model, however I am able to run the 4Q version just fine - although extremely slowly. So I thought I'd try the 2Q (chat) variant instead - but this version consistently fails with this output: Attaching the memory usage graph for both. The 3Q version also fails. 4Q version (standard) consistently works every time (although slow as syrup) I'm a bit of a newbie - but I thought this was interesting, and I was wondering of ways of how I could go about using the more quantized versions for faster performance. Perhaps there is an implementation issue between the standard model and the different quantization versions? ### Minimal reproducible example Run 70B 4Q - then run 70B 2Q on a M3 Pro 36gb ### Output Running 70B 4Q Failing to run 70B 2Q ## Runtime Environment - Model: [eg: ] - Using via huggingface? no, ollama - OS: Mac - GPU VRAM: 36gb - Number of GPUs: - GPU Make: Apple",2023-12-14T10:25:15Z,llama,https://github.com/meta-llama/llama/issues/964 963,2040787832,Whether a word vector can inversely derive a word.(community-discussion),"In a black box scenario, if a word vector is stolen, can an attacker deduce the word from it?",2023-12-14T02:31:52Z,llama,https://github.com/meta-llama/llama/issues/963 962,2038584861,Wrong pending for approval for LLama-2 message,"Even though I am approved and received an email from Meta, I get the following message: **Your request to access this repo has been successfully submitted, and is pending a review from the repo's authors.** History: The request was pending, so I went to the Meta site and re-registered. I got an immediate email. Perhaps when I registered from Hugging face, the emails were not identical. Would appreciate if you can help fix the issue. ",2023-12-12T21:36:19Z,llama,https://github.com/meta-llama/llama/issues/962 961,2038305411,Is the code in this repository only for inference?,"Can we finetune a llama using the model structure defined in this repository? I know we can use Huggingface codes to do the finetune. But I want to slightly modify the model architecture then do the finetune. The Huggingface class seems not flexible enough to do that. I have tried to use these code to finetune (build a Transformer class, load checkpoints, then use the Transformer to update the weights), but a lot of bug occurs.",2023-12-12T18:07:13Z,llama,https://github.com/meta-llama/llama/issues/961 960,2034372612,ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9),"## Problem Description After completing setup for CodeLlama, from the README.md, when I attempt to run any of the models, with the specified commands: OR OR I get the output with the error below: ### Output ## Runtime Environment - Model: [ , , ] - Using via huggingface?: [no] - OS: (via WSL2), Windows] - GPU VRAM: 4GB - Number of GPUs: 1 - GPU Make: [Nvidia] - GPU Version: NVIDIA GeForce GTX 1650 **Additional context** I am trying to run the models on Ubuntu through WSL 2, I tried setting the batch size to 6 ( ) as was mentioned in #706 but this did not help. ",2023-12-10T13:31:24Z,llama,https://github.com/meta-llama/llama/issues/960 959,2032876729,SafetensorError: Error while deserializing header: HeaderTooLarge ," ## Describe the bug I tried to load llama-2-70b-chat-hf with transformers, but I got an error: SafetensorError: Error while deserializing header: HeaderTooLarge ### Below is the code to execute ### Output Some error msg below ## Runtime Environment - Model: [llama-2-70b-chat-hf] - Using via huggingface?: [yes] - OS: - GPU VRAM: 81 G - Number of GPUs: 1 - GPU Make: [eg: Nvidia] I re-downloaded the safetensors file but could not solve it. Look forward to your reply ASAP. ",2023-12-08T15:36:00Z,llama,https://github.com/meta-llama/llama/issues/959 957,2032257456,Speed Issues with Local Inference of llama2-70B-chat Model,"Hi there, I hope this message finds you well. I am writing to report a performance issue I encountered while running the llama2-70B-chat model locally on an 8*A100 (80G) device. After downloading and configuring the model using the provided download.sh script, I attempted to run the example_chat_completion.py script with the following command: However, I encountered a RuntimeError related to inplace update to an inference tensor outside of InferenceMode. Following the advice given in this GitHub issue, I replaced in model.py and generation.py. This resolved the initial error, allowing the model to run locally. Nevertheless, I noticed a significant discrepancy in inference speed between the local environment and the online version available at this GitHub issue. Locally, the model takes approximately 5 minutes for each inference, while the online version provides almost real-time results. I have a few questions and concerns: 1. **Performance Discrepancy:** Is it reasonable to expect a difference in inference speed between local and online environments, or could there be an underlying issue with my local setup? 2. **Impact of Does replacing have any significant impact on the inference speed? Could it be a contributing factor to the observed slowdown? 3. **Hugging Face Models:** Would using the Hugging Face version of the model result in faster inference speeds compared to the locally configured llama2-70B-chat model? 4. **Optimizations for Local Inference:** Are there any specific optimizations or configurations, such as flash attention, that could be applied to improve the local inference speed? I appreciate your assistance in addressing these concerns and would be grateful for any guidance or recommendations to optimize the local performance of the llama2-70B-chat model. Thank you for your time and attention to this matter. Best regards, BAI Fan",2023-12-08T09:03:07Z,llama,https://github.com/meta-llama/llama/issues/957 956,2031836477,Can't get approved to access llama 2,"Hey all, sorry to post this here. I've applied to access llama 2 models via several times with several different emails and orgs. I always receive the ""Sorry, you are not eligible to access Llama 2"" email (two of them actually). Are no new applications being accepted, or perhaps a bug?",2023-12-08T02:43:48Z,llama,https://github.com/meta-llama/llama/issues/956 954,2031211029,Cuda OutOfMemoryError [Nvidia GeForce GTX 1080 Ti (11 GB )+ 24GB Ram] ,"I am trying to run the llama-2-7b out of the box with the following command `torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 1 ## Runtime Environment - Model: llama-2-7b - Using via huggingface?: no - OS: Windows - GPU VRAM: 11 GB - Number of GPUs: 1 - GPU Make: Nvidia - RAM: 24GB Disclaimer: I am an application engineer and not much into data science :-) just wanted to ask following questions 1. is it really possible to run the pyTorch model with these specs above? or going to a quantized model is better? 2. Why can't I increase the CUDA memory to use complete GPU (11Gb in my case, but it only allocates 4GB as per error) PS: i have changed the following nccl -> gloo based on some recommendations to make it work till here ",2023-12-07T17:31:47Z,llama,https://github.com/meta-llama/llama/issues/954 952,2028812044,Missing Dates in Download Access Request Page," The page to download the model has bugs in the date drop down. - February has 31 days (instead of 28 or 29). - March has 28 days (instead of 31). - April has 31 days (instead of 30). - May has 30 days (instead of 31). - June has 31 days (instead of 30). - July has 30 days (instead of 31). - et cetera... This appears to be an off-by-one bug as all the number of days in the month are off by one month.",2023-12-06T15:29:02Z,llama,https://github.com/meta-llama/llama/issues/952 951,2027719011,Embed size disparity,"Hello, I have been passing texts into llama2 7B to embed them and then use that data for a different DRL algorithm. I am trying to figure out what the different values of the embed tensors are? for example if i just pass a prompt of ""h"" for the 7B model in user mode: {""role"": ""user"", ""content"": ""h""} I then get a tensor of this size in model.py forward function of the transformer class : h = self.tok_embeddings(tokens) h.shape = torch.Size([1, 9, 4096]) What are the different values in the tensor? Is the second value (9 in this case) variable based on the size of the tokens?",2023-12-06T05:54:02Z,llama,https://github.com/meta-llama/llama/issues/951 950,2025398325,training loss curve of llama 1 and 2,"thanks for your awesome work! I have a question about the training curve of llama 1 and 2. in the training of llama 1, some loss spikes ocurred, but it is not the case for llama2. why did these spikes occur? because of datasets? ",2023-12-05T06:45:36Z,llama,https://github.com/meta-llama/llama/issues/950 949,2022479709,Error while running,"I run the code like (myenv) --nproc_per_node 1 example_chat_completion.py --ckpt_dir llama-2-7b --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6 I am getting issue like [2023-12-03 16 26,062] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [W socket.cpp:663] [c10d] The client socket has failed to connect to [BRNHYD0122L005]:29500 (system error: 10049 - The requested address is not valid in its context.). [W socket.cpp:663] [c10d] The client socket has failed to connect to [BRNHYD0122L005]:29500 (system error: 10049 - The requested address is not valid in its context.). Traceback (most recent call last): File ""example_chat_completion.py"", line 104, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""example_chat_completion.py"", line 35, in main generator = Llama.build( File line 85, in build torch.distributed.init_process_group(""nccl"") File line 74, in wrapper func_return = func(*args, **kwargs) File line 1148, in init_process_group default_pg, _ = _new_process_group_helper( File line 1268, in _new_process_group_helper raise RuntimeError(""Distributed package doesn't have NCCL built in"") RuntimeError: Distributed package doesn't have NCCL built in [2023-12-03 16 31,150] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 23260) of binary: Traceback (most recent call last): File line 194, in _run_module_as_main return _run_code(code, main_globals, None, File line 87, in _run_code exec(code, run_globals) File line 7, in File line 346, in wrapper return f(*args, **kwargs) File line 806, in main run(args) File line 797, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_chat_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-12-03_16 31 host : BRNHYD0122L005 rank : 0 (local_rank: 0) exitcode : 1 (pid: 23260) error_file: traceback : To enable traceback see: I created a conda environment after I run the 'pip install requirements.txt' also I install torch using 'pip install torch' after I tried to run the llama 7B model with torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir llama-2-7b --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6 but still facing the issue can any one help me to resolve the issue please and my laptop configurations Ram : 16GB os: windows 11 pro 64 bit bios: f.63 processor: 11th gen intel(R) core (TM) i5, 2.4z GHZ (8cpus) system model: HP 250 G8 Notebook pc SSD: 500GB GPU: 7.9GB ",2023-12-03T11:33:20Z,llama,https://github.com/meta-llama/llama/issues/949 948,2022280279,Download 403,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug The download process is now providing a 403 after previously functioning just a couple of days ago. Not sure if there is a bug or some sort of expire in the access token. It seems that such an access token should not expire so readily. ### Minimal reproducible example bash download.sh ### Output '''Resolving download.llamameta.net (download.llamameta.net)... 18.244.202.48, 18.244.202.110, 18.244.202.69, ... Connecting to download.llamameta.net (download.llamameta.net)|18.244.202.48|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2023-12-02 20 45 ERROR 403: Forbidden.''' ## Runtime Environment ",2023-12-03T01:23:08Z,llama,https://github.com/meta-llama/llama/issues/948 947,2022187868,Support for Mac M1/M2,"Adds support for Apple Silicon processors by using instead of CUDA. Same changes as in the Code Llama PR Tested on M1 Max, macOS 13.4 (Ventura), pytorch 2.1.1 ",2023-12-02T20:13:08Z,llama,https://github.com/meta-llama/llama/pull/947 946,2021720608,CUDA error: invalid device ordinal ,"## Describe the bug When trying the it throws out . I can confirm I have CUDA environment up as CUDA Device Query reports back the nVidia 3090 with no problem and conda is activated. ### Minimal reproducible example ### Output ## Runtime Environment - Model: - Using via huggingface?: no - OS: Ubuntu WSL2 on Windows with direct access to host GPU - GPU VRAM: 24GB - Number of GPUs: 1 - GPU Make: Nvidia 3090 **Additional context** CUDA Device Query reports the GPU correctly as below: ",2023-12-01T23:20:11Z,llama,https://github.com/meta-llama/llama/issues/946 945,2021397342,test commit for december hack, ,2023-12-01T18:37:49Z,llama,https://github.com/meta-llama/llama/pull/945 944,2020832225,How to Finetune?,"Hello, i want to use **llama-7B** for **chatbots**. **How can I finetune the model?** I want to teach its name, purpose. Trying to make human like conservation is necessary. **Should I use 7B-Chat** version too? Or is 7B enough? For the last question **how should be my dataset?** .csv, .txt or csv, .txt or .json? .json? Is there any kind of example for finetune like that?",2023-12-01T12:56:31Z,llama,https://github.com/meta-llama/llama/issues/944 943,2017946091,How do I train using a custom dataset?,"I understand how to create a training dataset in json. But I'm curious how I can proceed with my learning. Is there separate source code? If you have any related references, please share them.",2023-11-30T06:07:39Z,llama,https://github.com/meta-llama/llama/issues/943 942,2013813647,Llama 2 Access on Hugging Face,"Hello, I have received an email for access to the Llama-2 models but am still waiting on access through HuggingFace. This is my mistake, I believe I submitted the request on HuggingFace prior to submitting on the Meta website; is there a way to gain access on HF? My email is rosiezhao Sorry for the inconvenience, I appreciate the help! ",2023-11-28T07:22:12Z,llama,https://github.com/meta-llama/llama/issues/942 941,2010850675,Llama 2 access,"I have made multiple requests for model access but haven't received an approval yet. Email: arunas I made requests through the google form all these days. Found a new way to make the request through today after checking github issues. Request ID: 7702409196444760. Kindly approve it at the earliest! (Sorry for cc-ing you without checking, but I saw that you've been approving most requests. Thank you!)",2023-11-26T01:19:56Z,llama,https://github.com/meta-llama/llama/issues/941 940,2010412903,Files disappeared after download is finished," ## Describe the bug Hi everyone, I've tried to downloading the 7b and 13b models into my MacBook Pro m2 max computer and everything was working well. However once I finished downloading the models, the files that were supposed to contain them disappeared and I can't find them anywhere on the computer. When I looked at my computer storage, it appears that no space was taken, and yet it seemed the download was successful. What happened? ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: [no] - OS: [eg. MacOS] ",2023-11-25T01:13:02Z,llama,https://github.com/meta-llama/llama/issues/940 938,2006245821,why the mask hstack in model.py?,"Here is the code in model.py (line 482) Except the prompt input, the followed generated tokens are all only one token (seqlen=1). It means this mask operation only used for the first input(with prompt), and so the is always zero, the operation here actually doesn't do anything. Is anyone who knows the effect here?",2023-11-22T12:33:17Z,llama,https://github.com/meta-llama/llama/issues/938 937,2005797753,Llama2 Prompt Engineering,"[ Hi I'm studying about llama2. I'm trying to create a chat bot using llama open source, and My goal is to receive accurate answers when asked about embedded data. A query engine is built by embedding external data in the RAG system created through LlamaIndex. I'm also attempting prompt engineering using Few-shot Prompting, **CoT (Chain of Thought), and Self-Consistentcy.** ] [ Here's the problem. If I insert an example of a date in the prompt and ask a question about a single date (ex: 2023-10-19 14 29), the model will answer well. However, if I insert several examples in the prompt and ask about the **date range (ex: 2023-10-19 14 00 ~ 2023-10-19 15 00), llama will get an incorrect answer.** ] ------------------------------------------------ When what llama provides is accurate information, the prompt written is as follows: ↓↓↓↓ Your goal is to provide answers relating to the system logs. You are a technical supporter who is helpful to the company. All you have to do is find what I'm asking for in the csv file embedded in you, read each item, and answer. My csv file is a log file consisting of Type, Occurred Time, IP Address, and Log Details. If you don't know the answer to a question, don't share false information. For example: user : Please tell me the log with occured time '2023-10-17 16 33' response : The log that occurred on 2023-10-17 16 33 is as follows: Type : Information Occurred Time : 2023-10-17 13 09 IP Address : 192.~~ Log Details : Unable to establish a connection to the database. This is the answer given by llama2 : ↓↓↓↓ Of course! I'm happy to help you with that. Based on the information provided in the CSV ile. the log with ... Type : Error Occured time : 2023-10-17 16 33 IP Address : 192.~ Log Details : SQL syntax error in query. I hope that helps! Let me know if you have any other questions. It's a good example. That is actually correct information about embedded data. ---------------------------------------------------------- When what llama provides is incorrect information, the prompt written is as follows: ↓↓↓↓ Your goal is to provide answers relating to the system logs. You are a technical supporter who is helpful to the company. All you have to do is find what I'm asking for in the csv file embedded in you, read each item, and answer. My csv file is a log file consisting of Type, Occurred Time, IP Address, and Log Details. If you don't know the answer to a question, don't share false information. For example 1 : user : Please tell me the log with occured time '2023-10-17 16 33' response : The log that occurred on 2023-10-17 16 33 is as follows: Type : Error Occurred Time : 2023-10-17 16 33 IP Address : 192.~~ Log Details : Unable to establish a connection to the database. For example 2 : user :Please tell me all the logs whose type is warning and the occurrence time is October 19, 2023. reponse : There are 3 in total. Here are the details of the warning log that occurred on October 19th. Type : Warning Occured Time : 2023-10-19 14 23 P Address : 192.~~ Log Details : Low disk space detected on drive Type : Warning Occured Time : 2023-10-19 15 51 IP Address : 192.~~ Log Details : Low disk space detected on drive Type : Warning Occured Time : 2023-10-19 14 38 IP Address : 192.~~ Log Details : Low disk space detected on drive This is the answer given by llama2 : ↓↓↓↓ Based on the information provided in the CSV file and the context information, there are 8 logs with a type of Warning and an occurrence time of October 18, 2023. Here are the details of those logs: 1. Occurred Time: 2023-10-18 10 43 IP Address: 10.3.1.136 Log Details: Multiple failed login attempts 2. Occurred Time: 2023-10-18 11 51 IP Address: 10.3.1.136 Log Details: Multiple failed login attempts It's a wrong example. That is actually incorrect information about embedded data. ------------------------------------------------ [ To conclude, I wonder if **llama2 is not good at recognizing dates.** It embedded data containing dates and showed examples in the prompt, but it doesn't seem to understand **date ranges** other than single dates. For example, When I asked to know all the logs that occurred on October 19th, if there were actually 8 logs that occurred, I would like them to answer all 8. ] What do I need?? I use that model : ",2023-11-22T08:14:44Z,llama,https://github.com/meta-llama/llama/issues/937 936,2004345591,Running llama-2-13b for inferencing in Windows 11 WSL2 resulted in `Killed`,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** I did a search for the keyword , but could not find a related issue. ## Describe the bug + Minimal reproducible example This is my run.py code: This is my adapter_config.json code: These are my hardware specs: I'm using Windows 11 WSL2 Bash to run this command: I have set my .wslconfig file as follows: ### Output I expect a chat message to be displayed and a prompt for my chat input, but this is the actual output: How do I resolve this? Should I be testing llama-13b first before llama-2-13b? ## Runtime Environment - Model: - Using via huggingface?: no, the files had been downloaded. - OS: Windows 11 WSL2 - GPU VRAM: 7971 MB - Number of GPUs: 2 - GPU Make: Intel, and Nvidia ",2023-11-21T13:56:15Z,llama,https://github.com/meta-llama/llama/issues/936 934,1998366649,## Environment,"## Environment System: OS: macOS 12.6.8 CPU: (8) x64 Apple M1 Pro Memory: 27.49 MB 16.00 GB Shell: 5.8.1 - Binaries: Node: 16.16.0 - Yarn: 1.22.19 - npm: 7.24.2 - Watchman: 2023.07.10.00 - Managers: CocoaPods: Not Found SDKs: iOS SDK: Platforms: DriverKit 22.2, iOS 16.2, macOS 13.1, tvOS 16.1, watchOS 9.1 Android SDK: API Levels: 23, 28, 30, 31, 33 Build Tools: 30.0.2, 30.0.3, 33.0.0, 33.0.2 System Images: android-26 | ARM 64 v8a, android-27 | ARM 64 v8a, android-28 | Google ARM64-V8a Play ARM 64 v8a, android-31 | ARM 64 v8a, android-33 | Google APIs ARM 64 v8a, android-33 | Google Play ARM 64 v8a Android NDK: Not Found IDEs: Android Studio: 2022.2 AI-222.4459.24.2221.9971841 Xcode: - Languages: Java: 11.0.19 - npmPackages: Not Found react: 18.2.0 => 18.2.0 react-native: 0.72.0 => 0.72.0 react-native-macos: Not Found npmGlobalPackages: *react-native*: Not Found ## Things I’ve done to figure out my issue - I used upgrade-helper to do my upgrade. ## Upgrading version React Native 0.72.0 ## Description I've followed the each and every steps React Native Upgrade document to upgrade my current project from 0.68.5 to 0.72.0 and I've updated all the dependency of my project into the latest version. After that, when i tried to run my project locally i'm getting duplicate dependency error message. I've posted the screenshot below. **Package.json** ""dependencies"": { "" ""^11.0.0-next.18"", "" ""^11.9.0"", "" ""^6.3.1"", "" ""^2.0.4"", "" ""^3.0.7"", "" ""^8.2.0"", "" ""^5.1.4"", "" ""^1.5.1"", "" ""7.4.1"", "" ""^0.1.11"", "" ""9.4.1"", "" ""^1.8.1"", "" ""^6.2.1"", "" ""^16.5.0"", "" ""^16.5.0"", "" ""^16.5.0"", "" ""^16.5.0"", "" ""^16.5.0"", "" ""^16.5.0"", "" ""^5.11.15"", "" ""^5.3.19"", "" ""^5.9.8"", "" ""^5.14.9"", "" ""^4.22.0"", "" ""^4.22.0"", ""jest"": ""^28.1.3"", ""jest-fail-on-console"": ""^3.0.2"", ""lodash.throttle"": ""^4.1.1"", ""lottie-react-native"": ""^5.1.4"", ""moment"": ""^2.29.3"", ""npm"": ""^7.22.0"", ""patch-package"": ""^6.4.7"", ""path"": ""^0.12.7"", ""postinstall-postinstall"": ""^2.1.0"", ""react"": ""18.2.0"", ""react-hook-form"": ""^7.43.2"", ""react-native"": ""0.72.0"", ""react-native-animatable"": ""^1.3.3"", ""react-native-appsflyer"": ""^6.5.21"", ""react-native-auth0"": ""^2.13.1"", ""react-native-barcode-builder"": ""^2.0.0"", ""react-native-base64"": ""^0.2.1"", ""react-native-color-matrix-image-filters"": ""^5.2.14"", ""react-native-custom-switch-new"": ""^1.0.3"", ""react-native-device-info"": ""^8.7.1"", ""react-native-dotenv"": ""^3.3.1"", ""react-native-fast-image"": ""^8.6.1"", ""react-native-forter"": ""https zvGKcVtDhkfj4asNekSn ""react-native-fs"": ""^2.20.0"", ""react-native-geolocation-service"": ""^5.3.0-beta.4"", ""react-native-gesture-handler"": ""^1.10.3"", ""react-native-get-random-values"": ""^1.9.0"", ""react-native-image-crop-picker"": ""^0.39.0"", ""react-native-in-app-review"": ""4.1.1"", ""react-native-json-tree"": ""^1.3.0"", ""react-native-linear-gradient"": ""^2.5.6"", ""react-native-localize"": ""^2.2.1"", ""react-native-maps"": ""^1.3.1"", ""react-native-modal-datetime-picker"": ""^11.0.0"", ""react-native-onetrust-cmp"": ""^202306.2.0"", ""react-native-pager-view"": ""^6.0.0"", ""react-native-permissions"": ""^3.6.1"", ""react-native-progress"": ""^5.0.0"", ""react-native-reanimated"": ""^3.3.0"", ""react-native-render-html"": ""^6.3.4"", ""react-native-restart"": ""^0.0.22"", ""react-native-safe-area-context"": ""^3.3.2"", ""react-native-screens"": ""3.6.0"", ""react-native-scroll-bottom-sheet"": ""^0.7.0"", ""react-native-secure-key-store"": ""^2.0.9"", ""react-native-sha256"": ""^1.4.7"", ""react-native-share"": ""^7.4.1"", ""react-native-splash-screen"": ""^3.3.0"", ""react-native-stars"": ""^1.2.2"", ""react-native-svg"": ""^12.3.0"", ""react-native-tab-view"": ""^2.16.0"", ""react-native-tracking-transparency"": ""^0.1.1"", ""react-native-vector-icons"": ""^9.1.0"", ""react-native-webview"": ""^11.18.2"", ""sanitize-html"": ""^2.7.0"", ""tealium-react-native"": ""^2.2.0"", ""usabilla-react-native"": ""^1.0.0"", ""uuid"": ""^9.0.0"" }, ""devDependencies"": { "" ""^7.12.9"", "" ""^7.12.9"", "" ""^3.1.0"", "" ""^6.4.22"", "" ""^5.3.19"", "" ""^6.4.22"", "" ""^5.3.25"", "" ""^6.4.22"", "" ""^5.3.25"", "" ""^5.3.23"", "" ""^4.0.4"", "" ""^7.0.2"", "" ""^9.1.0"", "" ""^28.1.5"", "" ""^7.19.0"", "" ""^2.13.1"", "" ""^0.2.0"", "" ""^0.2.0"", "" ""^3.3.3"", "" ""17.0.2"", "" ""^2.6.2"", "" ""^4.29.2"", "" ""^4.30.0"", ""babel-jest"": ""^28.1.3"", ""babel-loader"": ""^8.2.5"", ""babel-plugin-module-resolver"": ""^4.1.0"", ""concurrently"": ""^6.2.1"", ""cross-env"": ""^7.0.3"", ""cspell"": ""^5.21.0"", ""eslint"": ""^7.32.0"", ""eslint-import-resolver-typescript"": ""^3.5.1"", ""eslint-plugin-import"": ""^2.26.0"", ""eslint-plugin-jest"": ""^26.2.2"", ""husky"": ""^7.0.0"", ""metro-react-native-babel-preset"": ""^0.70.3"", ""node-jq"": ""^2.3.3"", ""prettier"": ""^2.6.2"", ""react-hooks-testing-library"": ""^0.6.0"", ""react-native-cli-bump-version"": ""^1.4.0"", ""react-native-svg-transformer"": ""^0.14.3"", ""react-test-renderer"": ""18.0.0"", ""typescript"": ""4.3.5"", ""uri-scheme"": ""^1.0.120"" } _オリジナルは が にポスト_",2023-11-17T06:27:49Z,llama,https://github.com/meta-llama/llama/issues/934 933,1998173683,Evaluating Llama-70b on ARC-e/c,"Hello, I'm trying to reproduce the results the paper mentions for but I'm getting a accuracy of 38.3 on ARC-c, where as the paper mentions an accuracy of 57.4. I tried two methods since this is a MCQ dataset: 1) Extracting the output from the generated text 2) Calculating logits (same as what lm-eval-harness does) The first method didn't work out too well, since the model would generate randomly formatted outputs and answer questions that were out of the choices given. The logits method gives me a 38.3% accuracy. Could you guide me to the correct method? Much appreciated, Thank you!",2023-11-17T02:54:08Z,llama,https://github.com/meta-llama/llama/issues/933 932,1997544780,Few shot prompting,"Hi, Which model (either chat or text-completion) should be used for in-context learning using few-shot prompting?",2023-11-16T18:59:28Z,llama,https://github.com/meta-llama/llama/issues/932 931,1995531067,### 🦋 Changeset detected,"### 🦋 Changeset detected Latest commit: 1ed097e8c1837607b18ea4efced7c8a27ab39d53 **The changes in this PR will be included in the next version bump.** Not sure what this means? Click here to learn what changesets are. Click here to learn what changesets are _オリジナルは が にポスト_",2023-11-15T20:39:15Z,llama,https://github.com/meta-llama/llama/issues/931 930,1995527477,## Describe the bug,"## Describe the bug I have download the llama-2-13b-chat, but when I run the commond as fallow, I got errors: >LOGLEVEL=DEBUG torchrun --nproc_per_node gpu example_chat_completion.py > --ckpt_dir > --tokenizer_path tokenizer.model > --max_seq_len 512 --max_batch_size 8 to get error stack, I modified the example_chat_completion.py, but I got nothing, not any error stack has been written into log file. >from torch.distributed.elastic.multiprocessing.errors import record >def main(...): ### Output ## Runtime Environment - Model: llama-2-13b-chat - Using via huggingface?: no - OS: Ubuntu 22.04 - GPU VRAM: 48G - Number of GPUs: 2 - GPU Make: NVIDIA Corporation GA102 [GeForce RTX 3090] **Additional context** _オリジナルは が にポスト_",2023-11-15T20:36:36Z,llama,https://github.com/meta-llama/llama/issues/930 928,1995013773,torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 403477) of binary: /usr/bin/python3,"## Describe the bug I have download the llama-2-13b-chat, but when I run the commond as fallow, I got errors: >LOGLEVEL=DEBUG torchrun --nproc_per_node gpu example_chat_completion.py > --ckpt_dir > --tokenizer_path tokenizer.model > --max_seq_len 512 --max_batch_size 8 to get error stack, I modified the example_chat_completion.py, but I got nothing, not any error stack has been written into log file. >from torch.distributed.elastic.multiprocessing.errors import record >def main(...): ### Output ## Runtime Environment - Model: llama-2-13b-chat - Using via huggingface?: no - OS: Ubuntu 22.04 - GPU VRAM: 48G - Number of GPUs: 2 - GPU Make: NVIDIA Corporation GA102 [GeForce RTX 3090] **Additional context** ",2023-11-15T15:32:57Z,llama,https://github.com/meta-llama/llama/issues/928 926,1994598140,RYULEGALIZE,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here.",2023-11-15T11:30:54Z,llama,https://github.com/meta-llama/llama/issues/926 924,1994584781,How long does it take to get approved for Llama2?,"How long does it take to get approved for Llama2? I have tried with multiple email IDs but I still have not received any email granting access. I have verified my mail folders too. Pls advise. _オリジナルは が にポスト_ _オリジナルは が にポスト_",2023-11-15T11:22:18Z,llama,https://github.com/meta-llama/llama/issues/924 922,1994582334,"Hi,","Hi, I have downloaded Llama 2 and quantized it MacOs (llama.cpp). In the terminal, I am able to run the model with following command: -m -n 1024 --repeat_penalty 1.0 --color -i -r ""User:"" -f ` However I am confused how to load the model as well as the tokenizer in a Python script? In all tutorial I only see how the model is downloaded with Transformers like here: `from torch import cuda, bfloat16 import transformers model_id = bnb_config = transformers.BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=bfloat16 ) hf_auth = 'AUTH_TOKEN' model_config = transformers.AutoConfig.from_pretrained( model_id, use_auth_token=hf_auth ) model = transformers.AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, config=model_config, quantization_config=bnb_config, device_map='auto', use_auth_token=hf_auth ) model.eval() print(f""Model loaded on {device}"") tokenizer = transformers.AutoTokenizer.from_pretrained( model_id, use_auth_token=hf_auth ) ` Do I only need to replace the ""model_id"" with my path? _オリジナルは が にポスト_",2023-11-15T11:20:39Z,llama,https://github.com/meta-llama/llama/issues/922 921,1994581296,"I've been thinking about how to obtain the hidden state of the last layer of Llama2 after calling the chat_completion method, but I'm not sure how to implement it. I hope to get some answers.","I've been thinking about how to obtain the hidden state of the last layer of Llama2 after calling the chat_completion method, but I'm not sure how to implement it. I hope to get some answers. _オリジナルは が にポスト_ _オリジナルは が にポスト_",2023-11-15T11:19:56Z,llama,https://github.com/meta-llama/llama/issues/921 920,1994580568,"I've been thinking about how to obtain the hidden state of the last layer of Llama2 after calling the chat_completion method, but I'm not sure how to implement it. I hope to get some answers.","I've been thinking about how to obtain the hidden state of the last layer of Llama2 after calling the chat_completion method, but I'm not sure how to implement it. I hope to get some answers. _オリジナルは が にポスト_",2023-11-15T11:19:27Z,llama,https://github.com/meta-llama/llama/issues/920 919,1994579890,Hello,"Hello Looking at the dataset list, which dataset does the prompts with an empty model belong to? For example: ""id"": ""wgByO4Y_0"", ""model"": """", Thanks _オリジナルは が にポスト_",2023-11-15T11:18:59Z,llama,https://github.com/meta-llama/llama/issues/919 917,1994575161,RYULEGALIZE,"**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug ### Minimal reproducible example ### Output ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Windows] - GPU VRAM: - Number of GPUs: - GPU Make: [eg: Nvidia, AMD, Intel] **Additional context** Add any other context about the problem or environment here.",2023-11-15T11:15:46Z,llama,https://github.com/meta-llama/llama/issues/917 914,1984880426,Not reply Proper Answer with system prompt with llama-2-7b-chat model.,"I am reaching out for guidance on utilizing the Llama-2-7B-Chat model for generating color palettes. Our aim is to create three distinct color palettes specifically designed for a poster's layout. These palette should include color codes for the poster's background (referred to as BG), Heading 1 (H1 text), and Heading 2 (H2 text). I have to show only one palettes for One Input. The system prompt we plan to use with the Llama-2-7B-Chat model is as follows: ""system_prompt"": ""Generate three distinct color palettes, each containing color codes for a poster's background (BG), Heading 1 (H1 text), and Heading 2 (H2 text). Provide a palette for both dark and light versions."" Why With use of this system prompt not give right answer?",2023-11-09T05:46:16Z,llama,https://github.com/meta-llama/llama/issues/914 911,1983894473,LLaMA 1 access form not working,"Hi, you provide a Google form for accessing LLaMA 1 weights but that does not work, either for me or for other PhD students in my department. Nothing happens upon filling the form and we have never heard back. An old GitHub issue on this topic is also not getting any responses. Could you please advise on how to proceed? We really need the 30B model to replicate the results of a paper, and that model size is only available for LLaMA 1.",2023-11-08T15:42:05Z,llama,https://github.com/meta-llama/llama/issues/911 910,1983632441,llava_v1_5_mix665k dataset,"Hello Looking at the dataset list, which dataset does the prompts with an empty model belong to? For example: ""id"": ""wgByO4Y_0"", ""model"": """", Thanks",2023-11-08T13:35:48Z,llama,https://github.com/meta-llama/llama/issues/910 909,1983565131,How to obtain the hidden state of the last layer of Llama2 after calling the chat_completion method?,"I've been thinking about how to obtain the hidden state of the last layer of Llama2 after calling the chat_completion method, but I'm not sure how to implement it. I hope to get some answers.",2023-11-08T13:01:18Z,llama,https://github.com/meta-llama/llama/issues/909 908,1981899490,Meta model Conversion to the Hugging Face friendly version,"Hi, I am trying to use the meta LLam2 I downloaded from Meta, but it has problem that needs to be converted to Hugging Face friendly version, I can not use the ones in Hugging Face because the GPU server I am using cannot connect to the internet. So, I saw the code for conversion, but it is not clear where to run the code. Also, the input path should be the directory where I have all the files with Tokenizer and mode, or the path that is just for the model and contains .chk .chk and .json for the weights? I would appreciate it if someone could help me with this problem, I stuck like 2 weeks.",2023-11-07T17:44:59Z,llama,https://github.com/meta-llama/llama/issues/908 907,1980313710,License of Llama2 derivative model,"Our customers are interested in training a model using Llama2 as a starting point. Before investing significant time and compute resources into this work, I wanted to request clarification on how derivative models should be licensed. Based on my reading of the Llama2 license especially section , my understanding is that any model derived from Llama2 - whether by fine-tuning the weights or training from scratch using the codebase - would need to be released under the LLAMA 2 Community License. These derivative models could not be released under a more permissive license like MIT or Apache 2.0. The key points are: - Models fine-tuned from Llama2 weights need the LLAMA 2 Community License. - New models trained from scratch using the Llama2 codebase also need the LLAMA 2 Community License. - The LLAMA 2 Community License does not allow derivative works to be re-licensed under permissive licenses like MIT or Apache 2.0 that were not written for AI systems. - If codebase is implemetend from scratch by referring Llama2 license, it does not need to inherit license because paper itself is not included to the ""Llama Materilas"" Please let me know if this interpretation is accurate. I want to be certain I understand the obligations for derivative works before proceeding with model development using Llama2. Thank you again for the clarification. ## Related issues * * ",2023-11-07T00:28:31Z,llama,https://github.com/meta-llama/llama/issues/907 906,1980288492,docs. Correct the URL to the FAQ.md file,correct the URL to the FAQ.md file,2023-11-07T00:00:57Z,llama,https://github.com/meta-llama/llama/pull/906 905,1979503009,Vertical lines on token embeddings visualization,"I've visualized token embedding weights (loaded from as image (4096x32000 pixels) and I spotted some vertical lines that I don't understand. Here's a crop of the full image with these vertical lines clearly visible: Any explanation why some dimensions of the token embedding would be special?",2023-11-06T16:01:12Z,llama,https://github.com/meta-llama/llama/issues/905 904,1979283874,ERROR: OSError:lama-2-7b-chat does not appear to have a file named config.json. ,"Hi, I am trying to run the Llama-7b chat that I already downloaded from Meta locally. I got this configuration error because I am using Transformers. I do not know how to run or change the code to be able to run with Transformers. Also, my local system is a remote GPU server, which does not have permission to connect to the internet. ### Output OSError:lama-2-7b-chat does not appear to have a file named config.json. ` ",2023-11-06T14:20:41Z,llama,https://github.com/meta-llama/llama/issues/904 903,1977803873,Authorization to translate documentation (to PT-BR),"Hello Llama 2's team. First of all, I want to deeply thank you for all your contributions to AI - and to the world. Llama 2 is undoubtedly a significant step to democratizing AI. Meta is probably the most important player in terms of making AI indeed accessible to **everyone** and not actually charging for it - and more, actually contributing to the academy and individual students by making it Open Source. Thank you! And speaking of democratizing AI and information. We keep a non-profit students community here in Brazil, where language is still a barrier, with a focus on bringing high-quality material about ML and AI to Portuguese, so that Brazilian students have access to it. Our community is called **BRAINS - Brazilian AI Networks**. I have recently read your post **BRAINS - Brazilian AI Networks** on Meta AI's blog. And it is a masterpiece. From start to end. Very well written, concise and valuable at the same time. I want to apologize if I'm on the wrong channel to make such a request. But I'd like your permission to translate this blog post and have it available on our community - with proper credits, of course! If it is not up to you to give such authorization, I'd deeply appreciate of you could point me to the right direction. I'm confident thousands of Brazilian students, like me, would benefit from having this content accessible in Portuguese. Once again, thank you very much. For everything you've done and are still doing for the AI community. And I hope we can take access of this blog post even further by translating it to other languages. #NoBrains #NoGains 🧠",2023-11-05T14:05:03Z,llama,https://github.com/meta-llama/llama/issues/903 902,1977729511,"Running PyTorch produces a ""failed to create process""","# CONTEXT 1. I am trying to run llama2 on my local machine. 2. I have followed the documentation available on the github repository **thank you in advance for your support** # what did I do? 1. install anaconda 2. clone the llama repository 3. download the models 4. create a virtual environment named llama2 5. install pytorch on Anaconda `conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia pip install -e . torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6 failed to create process. `",2023-11-05T10:40:02Z,llama,https://github.com/meta-llama/llama/issues/902 901,1975404507,AttributeError: 'LlamaForCausalLM' object has no attribute 'medusa_head'," ",2023-11-03T03:36:08Z,llama,https://github.com/meta-llama/llama/issues/901 900,1975295933,Fix key-value caching for seqlen != 1 (Issue #899),"This PR fixes a bug in the key-value caching as described in #899. Currently, a square attention mask is misapplied to the scores matrix despite not matching the shape of the scores matrix. This results in a runtime error. In a correct implementation, the decoder mask needs to describe how the new tokens interact with all the cached tokens. That is, the attention mask needs to be of shape , indicating how the token at row (representing token in the transformer model) attends to token . Accordingly, the matrix needs to mask entries where . This patch horizontally appends zeros to an upper-triangular mask of size to form the mask. This code was tested with the example in issue #899.",2023-11-03T01:26:59Z,llama,https://github.com/meta-llama/llama/pull/900 899,1975294207,Incorrect attention mask breaks key-value caching,"## Describe the bug There is currently a bug in the model relating to key-value caching. A square attention mask is misapplied to the scores matrix despite not matching the shape of the scores matrix. This results in a runtime error. ### Minimal reproducible example ### Output ### Expected Output ## Runtime Environment - Model: Any - Using via huggingface?: No - OS: Linux 6.1.55-1-lts - GPU VRAM: - Number of GPUs: - GPU Make: ",2023-11-03T01:24:00Z,llama,https://github.com/meta-llama/llama/issues/899 898,1974916743,"Llama2 access request not yet approved, been over a week","How long does it take to get approved for Llama2? I have tried with multiple email IDs but I still have not received any email granting access. I have verified my mail folders too. Pls advise.",2023-11-02T19:32:54Z,llama,https://github.com/meta-llama/llama/issues/898 897,1974773929,"Correct ""bug,"" typo to ""bug"", in README.md", ,2023-11-02T17:47:25Z,llama,https://github.com/meta-llama/llama/pull/897 896,1973976571,An error occurred while running llama-2-7b," _## Describe the bug When i try to run the llama-2-7b model through torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 I encounter the following error message Traceback (most recent call last): File line 11, in checkpoint = map_location='gpu') File line 1028, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File line 1246, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: could not find MARK [2023-11-02 18 59,543] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 90675) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 806, in main run(args) File line 797, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-11-02_18 59 host : ai02-PR4910P rank : 0 (local_rank: 0) exitcode : 1 (pid: 90675) error_file: traceback : To enable traceback see: ============================================================ ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: [eg. Ubuntu - GPU VRAM: A100 - Number of GPUs:5 - GPU Make: Nvidia_ ",2023-11-02T10:44:57Z,llama,https://github.com/meta-llama/llama/issues/896 894,1973826305,Wha is difference between Llama-2-70b-hf and Llama-2-70b-fb format,The diffenence between hb and hf format lies in what?,2023-11-02T09:18:46Z,llama,https://github.com/meta-llama/llama/issues/894 893,1973526473,"It looks like your setup is missing fairscale. Also currently we only support NVIDIA GPUs, so maybe that might also be causing an issue."," It looks like your setup is missing fairscale. Also currently we only support NVIDIA GPUs, so maybe that might also be causing an issue. Closing this now, please reopen if you need to follow-up. _Originally posted by in ",2023-11-02T05:17:27Z,llama,https://github.com/meta-llama/llama/issues/893 892,1973495141,Update tokenizer2.py,"I added error handling for initialization, encoding, and decoding processes, and I used more informative logging to catch and report errors. This should make the code more robust and easier to debug.",2023-11-02T04:42:13Z,llama,https://github.com/meta-llama/llama/pull/892 889,1972890109,built some docs in case you are interested!, ,2023-11-01T18:32:54Z,llama,https://github.com/meta-llama/llama/issues/889 888,1972193290,Unable to download Llama 2 models,"I opened up conda. I created a new folder and cloned the llama github repository into it. In the llama repository, I first ran the command - I installed pytorch with this command - I then ran download.sh. When prompted to enter the URL from my email, I did. Note that I got the URL in my email inbox less than 24 hours ago (around 5-6 hours ago). Once I entered the link, I was asked to select which models I wanted to install. I pressed the enter key to install all models. The pop-up window that asked me to enter the URL closed automatically as soon as I chose which models I wanted to install. I had no sort of indication that the models were I've followed the instructions in the Quick Start section of the README file - - so I'm not sure where I've went wrong. would be appreciated! I have a WIndows 11 laptop with the NVIDIA GeForce RTX 3070 laptop GPU, 16GB of RAM. If there is a tutorial I should follow, please share them with me. I haven't found anything concrete yet. The README file is a bit vague.",2023-11-01T11:30:48Z,llama,https://github.com/meta-llama/llama/issues/888 887,1969003769, Llama-2-70b Model: Challenges with Long Token Sequences,"As the open-source Llama-2-70b model gains popularity within the community, questions arise about its performance on longer token sequences, potentially exceeding 2500 tokens. In my case, it seems to struggle after 500 tokens. Specifically, I'm referring to the Llama-2-70b model.",2023-10-30T18:44:55Z,llama,https://github.com/meta-llama/llama/issues/887 886,1968888419,Llama 2 checkpoint request no longer sending download link email,"Hi, Myself and other PhD students in my department are no longer receiving a download link email after requesting Llama 2 access through the form. We use our academic email address and up until ~3 days ago the email would be sent within seconds. We need a different model size which we hadn't downloaded before, hence why the new request, but no link is being sent anymore. Have tried for a couple of days now. Is the request form currently having issues? Thanks!",2023-10-30T17:33:06Z,llama,https://github.com/meta-llama/llama/issues/886 885,1968544457,"Click on ""Accept and Continue"" does NOTHING","## Describe the bug 1. I am about to accept the terms of conditions to download the Llama2 model 2. the process worked in the past AND I have received an email containing the link to download the model ### Minimal reproducible example 1. Click on 4. Fill the required contact details 5. Check ""Llama 2 & Llama Chat"" 6. Check ""Code Llama"" 7. Check ""I accept the terms and conditions"" 8. Click on ""Accept and Continue"" ### Output * **NOTHING**, the browser DOES NOT load a new page indicating the success of the operation ## Runtime Environment - Windows 11 - Browser : Chrome, Firefox, Bing - Mobile phone : OnePlus - Browser : Chrome ",2023-10-30T14:43:04Z,llama,https://github.com/meta-llama/llama/issues/885 884,1968452641,Custom personality,"I've been experimenting with Llama 2 7b chat for quite some time but have no idea how to make it have its own personality, are there any guide for that?",2023-10-30T14:03:05Z,llama,https://github.com/meta-llama/llama/issues/884 883,1966907011,it always report error when using llama2 model on mac,"i just follow this link to install llama2 model on mac m1,but it always report Errors: brew install llm llm install llm-llama-cpp llm install llama-cpp-python llm llama-cpp download-model --alias llama2-chat --alias l2c --llama2-chat llm -m l2c 'Tell me a joke about a llama' and result is Error: Could u helpe to find out why? ",2023-10-29T09:09:56Z,llama,https://github.com/meta-llama/llama/issues/883 882,1966722388,Problem with designing the prompt for my dataset - Multiplechoice QA,"Hello everyone, I have a dataset where I need to perform instruction fine-tuning using llama2. I am trying to make the prompt format right but I am still new so please do help me. In the dataset I want to finetne llama2 on I have: 1. A context where the answer should be infered. 2. A question. 3. Multiple choice. 4. Correct answer. and this is the structure I have created: is it correct or do I need to fix it? Thanks in a dvance ",2023-10-28T19:57:12Z,llama,https://github.com/meta-llama/llama/issues/882 881,1965708403,Error in ChildFailedError,"## Describe the bug torch.distributed.elastic.multiprocessing.errors.ChildFailedError: I have issue while running "" torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6 "" """""" ModuleNotFoundError: No module named 'fairscale' [2023-10-27 20 28,320] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 46862) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) ^^^^^^ File line 346, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File line 806, in main run(args) File line 797, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_chat_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-10-27_20 28 host : ob-90 rank : 0 (local_rank: 0) exitcode : 1 (pid: 46862) error_file: traceback : To enable traceback see: """""" ## Runtime Environment - Model: llama-2-7b - Using via huggingface?: no - OS: Linux Ubantu 22.04 - GPU VRAM: Mesa Intel® UHD Graphics 730 (ADL-S GT1) - Number of GPUs: 1 - GPU Make: Intel ",2023-10-27T15:00:39Z,llama,https://github.com/meta-llama/llama/issues/881 880,1965255099,meta-llama/Llama-2-7b-chat does not appear to have a file named config.json,"I have been trying to use HuggingFace Inference API for the model, But unfortunately, I'm getting an error Anyway, I do have access to this model. What is the correct way to use llama with API? Error: does not appear to have a file named config.json. check out ' for available files.""}",2023-10-27T10:44:04Z,llama,https://github.com/meta-llama/llama/issues/880 879,1964709434,llama2 is providing the wrong verse in English with wrong references and adding those word which isn't written in Holy Quran,Llama-2-70b-chat-hf does not provide a good result regarding the Holy Quran and its references. how is it's possible to get the exact Holy Quran verse using llama-2 even in English?,2023-10-27T03:43:24Z,llama,https://github.com/meta-llama/llama/issues/879 878,1964563529,torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 3221225477),"Hi, I have been attempting to launch Llama 2 with CPU. However have been stuck with the following error. `PS torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir llama-2-7b-chat --tokenizer_path tokenizer.model [2023-10-27 11 51,699] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [W socket.cpp:663] [c10d] The client socket has failed to connect to [AUSLF3NT9S311.MYBUSINESS.AU]:29500 (system error: 10049 - The requested address is not valid in its context.). [W socket.cpp:663] [c10d] The client socket has failed to connect to [AUSLF3NT9S311.MYBUSINESS.AU]:29500 (system error: 10049 - The requested address is not valid in its context.). > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 C 614: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at _C._set_default_tensor_type(t) [2023-10-27 11 52,150] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 3221225477) local_rank: 0 (pid: 7668) of binary: Traceback (most recent call last): File """", line 198, in _run_module_as_main File """", line 88, in _run_code File line 7, in File line 346, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File line 806, in main run(args) File line 797, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_chat_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-10-27_11 52 host : AUSLF3NT9S311.MYBUSINESS.AU rank : 0 (local_rank: 0) exitcode : 3221225477 (pid: 7668) error_file: traceback : To enable traceback see: ============================================================` system specs Processor 12th Gen Intel(R) Core(TM) i5-1245U, 1600 Mhz, 10 Core(s), 12 Logical Processor(s) Installed Physical Memory (RAM) 8.00 GB Total Virtual Memory 30.3 GB GPU Iris XE Graphics I have checked other issue posts but have yet to find a solution. Are the requirements to run it beyond my current computers capabilities?",2023-10-27T00:26:49Z,llama,https://github.com/meta-llama/llama/issues/878 877,1963202971, torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 0 (pid: 760),"Hi everybody, I tried to deploy the llama2 model in env: CUDA version: 12.1 ID of current CUDA device: 0 Name of current CUDA device: Quadro P4000 but I found the following issue, has someone an idea of what's wrong? torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6 > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 [2023-10-26 11 24,266] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 0 (pid: 2283) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 806, in main run(args) File line 797, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ===================================================== example_chat_completion.py FAILED ----------------------------------------------------- Failures: Root Cause (first observed failure): [0]: time : 2023-10-26_11 22 host : rank : 0 (local_rank: 0) exitcode : -9 (pid: 2283) error_file: traceback : Signal 9 (SIGKILL) received by PID 2283 ## Runtime Environment - Model: [eg: ] - Using via huggingface?: - OS: Ubuntu - GPU VRAM: 8GB - Number of GPUs: 4 - GPU Make: Nvidia Quadro P4000 ",2023-10-26T10:26:34Z,llama,https://github.com/meta-llama/llama/issues/877 875,1960663714,Run Llama 2 locally in Python script,"Hi, I have downloaded Llama 2 and quantized it MacOs (llama.cpp). In the terminal, I am able to run the model with following command: -m -n 1024 --repeat_penalty 1.0 --color -i -r ""User:"" -f ` However I am confused how to load the model as well as the tokenizer in a Python script? In all tutorial I only see how the model is downloaded with Transformers like here: `from torch import cuda, bfloat16 import transformers model_id = bnb_config = transformers.BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=bfloat16 ) hf_auth = 'AUTH_TOKEN' model_config = transformers.AutoConfig.from_pretrained( model_id, use_auth_token=hf_auth ) model = transformers.AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, config=model_config, quantization_config=bnb_config, device_map='auto', use_auth_token=hf_auth ) model.eval() print(f""Model loaded on {device}"") tokenizer = transformers.AutoTokenizer.from_pretrained( model_id, use_auth_token=hf_auth ) ` Do I only need to replace the ""model_id"" with my path?",2023-10-25T06:45:54Z,llama,https://github.com/meta-llama/llama/issues/875 874,1960041518,How to approch quantization and fine-tuning with the llama2 7B chat model with the code given in this repository., ,2023-10-24T20:41:46Z,llama,https://github.com/meta-llama/llama/issues/874 873,1959374628,"Installing llama-2 model closes the window, does nothing else","**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the FAQs and FAQs** ## Describe the bug When I ran Download.sh, I inputted the URL, and selected to download the 7B model. However, upon running, the window closes immediately. Is this meant to happen? What do I do from here? Remember to wrap the code and outputs in . ### Minimal reproducible example ### Output ## Runtime Environment - Model: None - Using via huggingface?: no - OS: Windows - GPU VRAM: 8GB - Number of GPUs: 1 - GPU Make: NVIDIA **Additional context** Download.sh was ran from the Git command window, because windows CMD didn't want to. ",2023-10-24T14:20:40Z,llama,https://github.com/meta-llama/llama/issues/873 872,1958430615,Llama2 Model Access Issue,"Hello, I have filled out the request form to access Llama 2 models. However, I have not received any response. Could someone please help in providing the access ? Thanks again ! ",2023-10-24T03:33:41Z,llama,https://github.com/meta-llama/llama/issues/872 871,1956279018,"hi,could you help me for llama2-13b-chat-hf"," ### Output ## Runtime Environment - Model: [eg: ] llama2-13b-chat-hf - Using via huggingface?: yes - OS: [eg. Windows] Linux - GPU VRAM: - Number of GPUs: 1 - GPU Make: [eg: Nvidia, AMD, Intel] Nvidia **Additional context** Add any other context about the problem or environment here. ",2023-10-23T03:25:50Z,llama,https://github.com/meta-llama/llama/issues/871 867,1952765032,SQUAD evalution,"Hello, I'm working on evaluating llama-2-70b-chat with respect to the SQUAD dataset, but it seems like the EM and F1 score don't match the scores mentioned in the paper. Not sure what I'm doing differently, could you clarify on how many samples of the SQUAD dataset is the model being evaluated on and what the system prompt looks like for this particular task. Thank you",2023-10-19T17:40:02Z,llama,https://github.com/meta-llama/llama/issues/867 866,1952219175,How can we use the internet mode in llama-2-70b-chat-hf,"How can we use the internet mode in llama-270-b-chat-hf. There's any reference link available or any thing which help me further for study that",2023-10-19T13:07:30Z,llama,https://github.com/meta-llama/llama/issues/866 865,1951114731,TypeError: __init__() got an unexpected keyword argument 'quantizer'," how to fix it",2023-10-19T03:14:50Z,llama,https://github.com/meta-llama/llama/issues/865 864,1949337189,ERROR: torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 0 (pid: 48045) of binary: /usr/bin/python3,"## Describe the bug Hi guys, I'm having problems running the Llama-2-7B model. The hardware configuration I have listed is below. I don't have a GPU, only a CPU. I was able to run it exactly once, but after that I couldn't run it anymore. This is all of steps I did: - Clone repo that I have attached it below. - Get download link from and downloaded model llama-7B. - Run download.sh by . - In the top dir, I ran: All of steps above are run correctly, until next step... - . When I run this command line above, an error appears: ### Output `ERROR: torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 0 (pid: 48045) of binary: [...] example_text_completion.py ERROR` ## Runtime Environment - Model: llama-2-7b from - Using via huggingface?: No - OS: Ubuntu 20.04 - CPU: Intel(R) Xeon(R) Bronze 3204 CPU 1.90GHz - GPU VRAM: 32GB RAM - Number of GPUs: 0 - GPU Make: None GPU, only CPU. If anyone has encountered a similar situation and fixed the error, please show me! Thanks a lot. ",2023-10-18T09:53:22Z,llama,https://github.com/meta-llama/llama/issues/864 863,1949075129,Queston: What is the difference between llama2-7B and llama-7B,"I applied for both, but get llama2 only. If it is the same for interfaces? I gonna use it in NEXT_GPT",2023-10-18T07:43:26Z,llama,https://github.com/meta-llama/llama/issues/863 862,1947110040,How to deploy non-huggingface format model online?,"I want to deploy the non-huggingface model on my server. But for model larger than 13b, it need to be run multi-process. But the text-generation-inference project must use huggingface model. Is there a way to deploy the non-huggingface model online?",2023-10-17T10:40:30Z,llama,https://github.com/meta-llama/llama/issues/862 861,1946454115,Response for Llama2 Access,"Hi! I submitted the request to access Llama-2 but I got no response. I'm working on a research about NLP, Can you help me?",2023-10-17T03:13:52Z,llama,https://github.com/meta-llama/llama/issues/861 859,1943656872,"[closes #858] change ""Content Length"" to ""Context Length MODEL_CARD.md"," In the table comparing Model Architectures ""Content Length"" should be ""Context Length""",2023-10-15T02:33:45Z,llama,https://github.com/meta-llama/llama/pull/859 858,1943655369,"On MODEL_CARD.md ""Content Length"" should be ""Context Length"" ","In the table comparing Model Architectures ""Content Length"" should be ""Context Length""",2023-10-15T02:28:33Z,llama,https://github.com/meta-llama/llama/issues/858 857,1942326743,torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.,"Hi, I was trying to run llama2 in my local computer (Windows 10, 64 GB RAM, GPU 0 Intel(R) Iris (R) Xe Graphics). Got following error - 1. raise RuntimeError(""Distributed package doesn't have NCCL built in"") Resolved by import torch torch.distributed.init_process_group(""gloo"") 2. torch._C._cuda_setDevice(device) AttributeError: module 'torch._C' has no attribute '_cuda_setDevice' Resolved by commenting out if device >= 0: torch._C._cuda_setDevice(device) in 3. TypeError: type torch.cuda.HalfTensor not available. Torch not compiled with CUDA enabled. What should I do know? Is it even possible to make llama work in a computer with Intel GPU?",2023-10-13T17:16:16Z,llama,https://github.com/meta-llama/llama/issues/857 856,1941613483,torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 0 (pid: 62498) of binary,"(llama) znr torchrun --nproc_per_node 1 example_chat_completion.py > --ckpt_dir > --tokenizer_path > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 [2023-10-13 17 02,544] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 0 (pid: 62498) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 806, in main run(args) File line 797, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ====================================================== example_chat_completion.py FAILED ------------------------------------------------------ Failures: ------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-10-13_17 02 host : znr-OMEN-by-HP-Laptop-17-cm2xxx rank : 0 (local_rank: 0) exitcode : -9 (pid: 62498) error_file: traceback : Signal 9 (SIGKILL) received by PID 62498 ================================ ====================== this issue confused me long time",2023-10-13T09:42:40Z,llama,https://github.com/meta-llama/llama/issues/856 855,1939047867,The license for non-English services of LLaMA2.,"The LLaMA2 license specifies that 'unrestricted commercial use is allowed as long as the Monthly Active Users (MAU) do not exceed 700 million.' Does this mean there are no restrictions on commercial use when offering LLaMA2 in languages other than English, such as Korean or Chinese, as long as the MAU does not exceed 700 million?",2023-10-12T02:10:29Z,llama,https://github.com/meta-llama/llama/issues/855 854,1938945654,RuntimeError: The expanded size of the tensor (22528) must match the existing size (1024) at non-singleton dimension 0,"Hi all, We are running a setup where we utilize runpod, where we can download and utilize the 7b and 13b model, while below error is present when we try to utilize the 70b model. Any idea what can cause this? 2023-10-10T10 50.329412059Z File line 131, in load_weights 2023-10-10T10 50.329412900Z module._parameters[param_name][value.shape[0] * 2 :] = value 2023-10-10T10 50.329413751Z RuntimeError: The expanded size of the tensor (22528) must match the existing size (1024) at non-singleton dimension 0. Target sizes: [22528, 8192]. Tensor sizes: [1024, 8192] 2023-10-10T10 50.329415123Z rank=0 2023-10-10T10 51.275395485Z Error: ShardCannotStart 2023-10-10T10 51.275419161Z 2023-10-10T10 51.275248Z ERROR text_generation_launcher: Shard 0 failed to start: 2023-10-10T10 51.275440002Z Traceback (most recent call last): 2023-10-10T10 51.275441445Z 2023-10-10T10 51.275442767Z File line 8, in 2023-10-10T10 51.275444099Z sys.exit(app()) 2023-10-10T10 51.275444990Z 2023-10-10T10 51.275445931Z File line 67, in serve 2023-10-10T10 51.275447323Z server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path) 2023-10-10T10 51.275448205Z 2023-10-10T10 51.275449026Z File line 155, in serve 2023-10-10T10 51.275450418Z asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code)) 2023-10-10T10 51.275451249Z 2023-10-10T10 51.275452041Z File line 44, in run 2023-10-10T10 51.275452792Z return loop.run_until_complete(main) 2023-10-10T10 51.275453703Z 2023-10-10T10 51.275454504Z File line 647, in run_until_complete 2023-10-10T10 51.275455426Z return future.result() 2023-10-10T10 51.275456688Z 2023-10-10T10 51.275457349Z File line 124, in serve_inner 2023-10-10T10 51.275458360Z model = get_model(model_id, revision, sharded, quantize, trust_remote_code) 2023-10-10T10 51.275459191Z 2023-10-10T10 51.275459953Z File line 246, in get_model 2023-10-10T10 51.275474935Z return llama_cls( 2023-10-10T10 51.275475937Z 2023-10-10T10 51.275476778Z File line 67, in __init__ 2023-10-10T10 51.275477729Z self.load_weights(model, filenames, quantize, device, dtype) 2023-10-10T10 51.275479121Z 2023-10-10T10 51.275479953Z File line 131, in load_weights 2023-10-10T10 51.275480874Z module._parameters[param_name][value.shape[0] * 2 :] = value 2023-10-10T10 51.275481685Z 2023-10-10T10 51.275482547Z RuntimeError: The expanded size of the tensor (22528) must match the existing size (1024) at non-singleton dimension 0. Target sizes: [22528, 8192]. Tensor sizes: [1024, 8192] 2023-10-10T10 51.275484029Z ",2023-10-12T00:00:39Z,llama,https://github.com/meta-llama/llama/issues/854 853,1938878035,Download.sh not working,"When I run the download.sh file using wsl.exe, I get the following issue: `download.sh: line 2: command not found download.sh: line 5: command not found : invalid optione 6: set: - set: usage: set [-abefhkmnptuvxBCHP] [-o option-name] [--] [arg ...] download.sh: line 7: command not found ': not a valid identifier: `PRESIGNED_URL ': not a valid identifierd: `MODEL_SIZE download.sh: line 13: command not found download.sh: line 27: syntax error near unexpected token 'ownload.sh: line 27: ",2023-10-11T23:07:26Z,llama,https://github.com/meta-llama/llama/issues/853 852,1938644036,How to prevent answer generation from being interrupted,"When I use the llama2 model to generate a long answer, it always interrupts in the middle and does not finish this answer. How do I prevent this situation? This is my setting: ",2023-10-11T20:11:50Z,llama,https://github.com/meta-llama/llama/issues/852 851,1938542734,Faq updates,Adding OS and download questions to FAQs,2023-10-11T19:11:21Z,llama,https://github.com/meta-llama/llama/pull/851 850,1937210144,Only can run in Linux?,"**I try this commad on my windows computer:** > torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir llama-2-7b-chat --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6 **Error:** > [2023-10-11 16 34,795] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. > [W socket.cpp:663] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - The address of the request is invalid in its context 。). > [W socket.cpp:663] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - The address of the request is invalid in its context。). **My hardware:** > nvidia-3090-24G; Window 10 **Is that mean I can not run llama in windows system or single GPU? Any pro can reply to me? Appreciate!**",2023-10-11T08:46:12Z,llama,https://github.com/meta-llama/llama/issues/850 849,1936726023,"hello, any solution on llama download, always encounter ""tokenizer_checklist.chk: no properly formatted MD5 checksum lines found""","I followed the instruction and didn't get a chance to paste download url. Solutions in the closed issues seem not work for me. $ Enter the URL from email: gaoyuze.m Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 70B Downloading LICENSE and Acceptable Usage Policy --2023-10-11 11 46-- Resolving gmail.com (gmail.com)... 64.233.170.19, 64.233.170.18, 64.233.170.83, ... Connecting to gmail.com (gmail.com)|64.233.170.19|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: [following] --2023-10-11 11 46-- Resolving mail.google.com (mail.google.com)... 142.251.12.18, 142.251.12.83, 142.251.12.17, ... Connecting to mail.google.com (mail.google.com)|142.251.12.18|:443... connected. HTTP request sent, awaiting response... 302 Moved Temporarily Location: https [following] --2023-10-11 11 46-- https Resolving accounts.google.com (accounts.google.com)... 172.217.194.84 Connecting to accounts.google.com (accounts.google.com)|172.217.194.84|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https [following] --2023-10-11 11 46-- https Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 302 Moved Temporarily Location: [following] --2023-10-11 11 46-- Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 200 OK Length: unspecified Saving to: [ <=> ] 574.30K in 0.04s 2023-10-11 11 46 (13.2 - saved [588741] --2023-10-11 11 46-- Resolving gmail.com (gmail.com)... 64.233.170.18, 64.233.170.83, 64.233.170.19, ... Connecting to gmail.com (gmail.com)|64.233.170.18|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: [following] --2023-10-11 11 46-- Resolving mail.google.com (mail.google.com)... 142.251.12.83, 142.251.12.17, 142.251.12.19, ... Connecting to mail.google.com (mail.google.com)|142.251.12.83|:443... connected. HTTP request sent, awaiting response... 302 Moved Temporarily Location: https [following] --2023-10-11 11 46-- https Resolving accounts.google.com (accounts.google.com)... 172.217.194.84 Connecting to accounts.google.com (accounts.google.com)|172.217.194.84|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https [following] --2023-10-11 11 46-- https Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 302 Moved Temporarily Location: [following] --2023-10-11 11 46-- Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 200 OK Length: unspecified Saving to: [ <=> ] 573.72K in 0.03s 2023-10-11 11 46 (18.4 - saved [587718] Downloading tokenizer --2023-10-11 11 46-- Resolving gmail.com (gmail.com)... 64.233.170.18, 64.233.170.83, 64.233.170.17, ... Connecting to gmail.com (gmail.com)|64.233.170.18|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: [following] --2023-10-11 11 46-- Resolving mail.google.com (mail.google.com)... 142.251.12.17, 142.251.12.83, 142.251.12.19, ... Connecting to mail.google.com (mail.google.com)|142.251.12.17|:443... connected. HTTP request sent, awaiting response... 302 Moved Temporarily Location: https [following] --2023-10-11 11 47-- https Resolving accounts.google.com (accounts.google.com)... 172.217.194.84 Connecting to accounts.google.com (accounts.google.com)|172.217.194.84|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https [following] --2023-10-11 11 47-- https Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 302 Moved Temporarily Location: [following] --2023-10-11 11 47-- Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 200 OK Length: unspecified Saving to: [ <=> ] 573.68K in 0.05s 2023-10-11 11 47 (10.9 - saved [587634] --2023-10-11 11 47-- Resolving gmail.com (gmail.com)... 64.233.170.83, 64.233.170.17, 64.233.170.19, ... Connecting to gmail.com (gmail.com)|64.233.170.83|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: [following] --2023-10-11 11 47-- Resolving mail.google.com (mail.google.com)... 142.251.12.17, 142.251.12.19, 142.251.12.18, ... Connecting to mail.google.com (mail.google.com)|142.251.12.17|:443... connected. HTTP request sent, awaiting response... 302 Moved Temporarily Location: https [following] --2023-10-11 11 49-- https Resolving accounts.google.com (accounts.google.com)... 172.217.194.84 Connecting to accounts.google.com (accounts.google.com)|172.217.194.84|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https [following] --2023-10-11 11 49-- https Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 302 Moved Temporarily Location: [following] --2023-10-11 11 49-- Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 200 OK Length: unspecified Saving to: [ <=> ] 574.36K in 0.07s 2023-10-11 11 49 (8.27 - saved [588147] md5sum: tokenizer_checklist.chk: no properly formatted MD5 checksum lines found",2023-10-11T03:27:41Z,llama,https://github.com/meta-llama/llama/issues/849 848,1936600304,"llama2 model meory always go up, any mechanism to trigger gc?", ,2023-10-11T01:22:38Z,llama,https://github.com/meta-llama/llama/issues/848 847,1934864217,Llama 2 7B Inference time issue,"hi, How do i improve the inference time of my Llama2 7B model?.... i used BitsAndBytesConfig also but this does not seem to fasten the inference time! code: `name = tokenizer = AutoTokenizer.from_pretrained(name) tokenizer.pad_token_id = tokenizer.eos_token_id # for open-ended generation bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type=""nf4"", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, ) model = AutoModelForCausalLM.from_pretrained( name, device_map=""auto"", quantization_config=bnb_config, trust_remote_code=True, load_in_8bit=True, ) generation_pipe = pipeline( ""text-generation"", model=model, tokenizer=tokenizer, num_return_sequences=1, do_sample=True, eos_token_id=tokenizer.eos_token_id, device_map=""auto"", # finds GPU max_length=2000, top_k=10, top_p=0.9, temperature = 0.8, batch_size=1, ) llm = HuggingFacePipeline(pipeline = generation_pipe)`",2023-10-10T09:18:15Z,llama,https://github.com/meta-llama/llama/issues/847 846,1934576319,How to do conversation with the llama-2-7B-chat model.,"Hey, hope you doing well. I am able to run inference on the llama-2-7B-chat model successfully with the example python script provided. I am new to working and experimenting with large language models. I wanted to know how can i do conversation with the model where the model will consider its previous user prompts chat completion context too for answering next user prompt. I am currently experimenting with the dialogue list present in the example python script but it seems that i will have to go through all of the code and make changes in it. Any guidance is much appreciated. Thank you!",2023-10-10T07:15:43Z,llama,https://github.com/meta-llama/llama/issues/846 845,1934161047, How to get PRESIGNED URL , ,2023-10-10T02:12:29Z,llama,https://github.com/meta-llama/llama/issues/845 844,1932249161,No Confirmation Response for Llama2 Access. Request Made Several Days Ago,"Hi, I filled out the license form on Meta's website for access to the Llama2 model as instructed on the website. However, I have not received any confirmation or permission so far. I would like to know if my request got through and is being processed, or if there's something else I need to do to get access, especially since it has been nearly 2 weeks since the request. Thanks for any help.",2023-10-09T03:42:14Z,llama,https://github.com/meta-llama/llama/issues/844 843,1931995467,Fire module not found when running example script. Please help.,"Traceback (most recent call last): File ""example_chat_completion.py"", line 4, in import fire ModuleNotFoundError: No module named 'fire' ERROR failed (exitcode: 1) local_rank: 0 (pid: 2798750) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_chat_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-10-08_18 49 host : surbhi rank : 0 (local_rank: 0) exitcode : 1 (pid: 2798750) error_file: traceback : To enable traceback see: ============================================================ I have installed fire using pip but it always throw ModuleNotFoundError. I am using remote ssh to container with python environment and python 3.11.x and fire 0.5.0 install. What am i missing?",2023-10-08T18:37:05Z,llama,https://github.com/meta-llama/llama/issues/843 842,1931811824,Clarified and Improved Sentence," **Description:** I have rephrased a sentence in the readme.md file to make it more clear and engaging. The original sentence was: ""We are unlocking the power of large language models. Our latest version of Llama is now accessible to individuals, creators, researchers, and businesses of all sizes so that they can experiment, innovate, and scale their ideas responsibly."" The rephrased sentence is: ""We're empowering individuals, creators, researchers, and businesses of all sizes with access to the latest version of Llama. Our goal is to inspire innovation and responsible scaling, making it easy for you to unlock the full potential of large language models."" I believe this change will make the message more inviting and encourage users to explore Llama's capabilities. **Related Issue:** Please link to any related issue or leave this section empty if there isn't one. **Checklist:** - [ ] I have tested the changes locally. - [ ] I have made sure that the new sentence does not introduce any grammatical or typographical errors. - [ ] I have created this pull request from my forked repository. ",2023-10-08T13:46:42Z,llama,https://github.com/meta-llama/llama/pull/842 840,1930359389,download.sh error http://: Invalid host name,"When I try to execute the download.sh file, first I could not get wget to work (Windows 11). I put the wget.exe file to my path which made the error go away. Now I have a new error: `Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 7B-chat Downloading LICENSE and Acceptable Usage Policy http Invalid host name. ` I already tried using a new key (URL) but that does also not seem to work. Sadly I could not find out how to fix my issue so any help would be greatly appreciated. ",2023-10-06T14:56:39Z,llama,https://github.com/meta-llama/llama/issues/840 837,1923463412,Error when using LM Studio,"Hi I encountered this error, when using LM Studio. Kindly suggest what happens and how to tweak the parameters: Thanks",2023-10-03T07:07:48Z,llama,https://github.com/meta-llama/llama/issues/837 836,1921264290,Failed to run llama2-13B but it worked with llama2-7B,"It worked with llama2-7b. But when I tried to run the **llama2-13b** model using this , it didn't work. Error log in brief: #### Full error log #### System Specs i9 9900K + 16G DDR4 (with 16GB swap) + 2080ti (modded version with 22GB VRAM, the card runs smoothly on Windows and Linux) OS: Ubuntu 22.04 x86_64 Environments: From miniconda `conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia log *-display description: VGA compatible controller product: TU102 [GeForce RTX 2080 Ti Rev. A] vendor: NVIDIA Corporation physical id: 0 bus info: pci 00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress vga_controller bus_master cap_list rom configuration: driver=nvidia latency=0 resources: iomemory:2f0-2ef iomemory:2f0-2ef irq:186 memory:de000000-deffffff memory:2fe0000000-2fefffffff memory:2ff0000000-2ff1ffffff ioport:e000(size=128) memory:c0000-dffff *-display description: Display controller product: CoffeeLake-S GT2 [UHD Graphics 630] vendor: Intel Corporation physical id: 2 bus info: pci 02.0 version: 02 width: 64 bits clock: 33MHz capabilities: pciexpress msi pm bus_master cap_list configuration: driver=i915 latency=0 resources: iomemory:2f0-2ef iomemory:2f0-2ef irq:185 memory:2ffe000000-2ffeffffff memory:2fd0000000-2fdfffffff ioport:f000(size=64) *-graphics product: EFI VGA physical id: 2 logical name: capabilities: fb configuration: depth=32 resolution=2560,1080 python import torch device_count = torch.cuda.device_count() print(f""Number of available devices: {device_count}"") for i in range(device_count): print(f""Device {i}: {torch.cuda.get_device_name(i)}"") log +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000 00.0 On | | | 41% 34C P8 30W 260W | 288MiB 22528MiB | 12% Default | | | | | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 2216 G 165MiB | | 0 2338 G 34MiB | | 0 34805 G ...26077060,3793940789578302769,262144 82MiB | | 0 44004 G 3MiB | +---------------------------------------------------------------------------------------+ ` #### My attempt NO.2 Changed to Pytorch nightly and cuda 12.1 support. My Linux is using Nvidia driver version 535.113.01 with cuda 12.2 support. Pytorch version: 2.2.0.dev20231001 Same error. #### My attempt NO.3 Downgrade the Linux driver? (Not tested yet) #### My attempt NO.4 Use the Docker version Pytorch and CUDA inside a docker instance. After downloading the docker image, i started a docker instance by doing so Error How to run llama2-13B-chat or 70B with a RTX graphics card of 22GB RAM? Thanks in advance! ",2023-10-02T05:17:41Z,llama,https://github.com/meta-llama/llama/issues/836 835,1920254509,Making model personalised,"I came across this post by post where it has been described as how to get rid of ""..as an AI language model..."" and making the model more personalised. This method needs each and every token id to be known in prior. Is there any way this issue can be circumvented and we don't require any token ids, but instead change the logits as done by post? I'm not looking for fine-tuning based methods.",2023-09-30T13:15:43Z,llama,https://github.com/meta-llama/llama/issues/835 834,1920115822,"I am successfully running ""llama-2-7b-chat"" but have problems with ""llama-2-13b-chat"" and ""llama-2-70b-chat""","**My hardware** When running this is the output:


**example_chat_completion.py** file used below with all models is unmodified from original repo:


**llama-2-7b-chat** I am following README.md and succesfully run ""llama-2-7b-chat"" model with: With ""llama-2-7b-chat"" everything works well.


**llama-2-13b-chat** Now I am trying to modify code above that runs ""llama-2-7b-chat"" to run ""llama-2-13b-chat"": After running it this is the output: and after nothing happens.


**llama-2-70b-chat** Also, I am trying to run ""llama-2-70b-chat"": but getting following error:


**Question:** How to correctly run ""llama-2-13b-chat"" and ""llama-2-70b-chat""?",2023-09-30T04:36:09Z,llama,https://github.com/meta-llama/llama/issues/834 833,1920115142,"every time I enter the link from my email and select my midel i get ""download.sh: line 19: wget: command not found""","Sorry, trying to figure this out while being new here. Any help would be great.",2023-09-30T04:33:05Z,llama,https://github.com/meta-llama/llama/issues/833 831,1918397968,Some help with very slow performance,"Hi all. I've done the installation in my PC. My configuration is: - 2 * Xeon E5-2678 v3 (24 cores 48 threads) at 2,80Ghz - 64 Gb Ram - NVIDIA GeForce GTX 1080 - 4 SSD I use WSL2 on Windows 10 operating system I use the 7B file and i start with this command: torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6 after seen the result when i try to launch the system is the system remains stucked and never shows the "">>>User"" text, i have taken a file from a user here and changed the example_chat_completion.py to this: it takes about 25 seconds to start, the GPU is at 100% and when i type something as >>>User: the system respond after 3 or 4 minutes or more... After read that some users tries with CPU and also some similar GPU with aceptable results, im totally lost and stopped. What can i do to have better response? Where is my error? I will appreciate some help. Thanks in advance",2023-09-28T23:33:59Z,llama,https://github.com/meta-llama/llama/issues/831 830,1917735874,ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 16079) ,"Hi , I try deploy llama2 today , and found the issue : (llama) [root llama]# torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 6 > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Loaded in 12.05 seconds Traceback (most recent call last): File line 69, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 56, in main results = generator.text_completion( File line 265, in text_completion generation_tokens, generation_logprobs = self.generate( File line 115, in decorate_context return func(*args, **kwargs) File line 165, in generate total_len = min(params.max_seq_len, max_gen_len + max_prompt_len) TypeError: can only concatenate str (not ""int"") to str ERROR failed (exitcode: 1) local_rank: 0 (pid: 16079) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-09-28_22 34 host : iZbp1iobggdz6jrvvlgpx4Z rank : 0 (local_rank: 0) exitcode : 1 (pid: 16079) error_file: traceback : To enable traceback see: Please help me resolve the issue Thank you",2023-09-28T14:59:27Z,llama,https://github.com/meta-llama/llama/issues/830 828,1914014408,Why the extra is coming after tokenizer,"Hi , I am trying to run DataCollatorForCompletionOnlyLM . But for that i need to get the start of response . I am trying to get the response key but it looks like tokenizer is not deterministic. Attaching image. Encoding & decoding is changing the input string. ",2023-09-26T17:52:46Z,llama,https://github.com/meta-llama/llama/issues/828 827,1911299708,"AssertError: (6, 4)"," when i running `torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir --tokenizer_path --max_seq_len 128 --max_batch_size 6 ` i have set max_batch_size 6,but print it still 4. ihave no idea about this . i dont know bsz's meaning",2023-09-25T11:43:09Z,llama,https://github.com/meta-llama/llama/issues/827 826,1910315072,Cannot download anymore?,"Got the following message when run the download.sh Connecting to download.llamameta.net (download.llamameta.net)|13.249.205.86|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2023-09-24 11 28 ERROR 403: Forbidden.",2023-09-24T17:31:05Z,llama,https://github.com/meta-llama/llama/issues/826 825,1909833040,AssertionError: no checkpoint files found in --ckpt_dir,"I am attempting to run Llama 2 per the basic instructions: and receive the following: Looking at the following error, I have made sure that I am in the directory in which the llama models were downloaded into: #577 In the llama-2-7b-chat folder I have: checklist.chk, consolidated.00.pth, and params.json Any help would be appreciated.",2023-09-23T11:19:16Z,llama,https://github.com/meta-llama/llama/issues/825 824,1909749633,access to llama-v1 weights (no response),"Hi, I need access to the llama-v1 weights (other work relies on llama-v1). I filled out the Google form but haven't heard back. Are you no longer supporting the v1 model? Best, Orr",2023-09-23T07:19:38Z,llama,https://github.com/meta-llama/llama/issues/824 822,1909660703,"Add ""--continue"" flag to wget for model binary in order to resume dl","""--continue"" added to allow downloads to resume with aim of saving bandwidth. This change brings behavior for this element of the download script into line with codellama download script",2023-09-23T01:26:41Z,llama,https://github.com/meta-llama/llama/pull/822 821,1909651590,Is there any way to generate embeddings?,Is there any way to create embeddings with llama like OpenAI embedding endpoints?,2023-09-23T00:58:11Z,llama,https://github.com/meta-llama/llama/issues/821 820,1909177825,HuggingFace request not yet approved,"Same problem as #724 and #750 and #812 I filled in the META form and immediately received an email with the licence and confirmation of access. I also filled in HF request; this was over a week ago and is still pending. The META licence email is now my primary HF email though I have additional emails also associated to my HF account. I understand from the other two issues this is handled and processed by Meta, but is there any way to trigger a review of the request? I don't see any way from HuggingFace to cancel the request and submit it again, for instance. ",2023-09-22T16:09:04Z,llama,https://github.com/meta-llama/llama/issues/820 819,1908453750,What are the pc configuration to run llama2 70B?,"My current: Still shows out of memory error for CPU. Thanks",2023-09-22T08:38:19Z,llama,https://github.com/meta-llama/llama/issues/819 818,1908380758,Bash Download.sh not working when trying to download the llama 2 model ,"Hi, I have tried bash download.sh, several times. It keeps providing me with this error. download.sh: line 1: payload false: command not found I am running on a 2019 MacBook Pro Intel i9. Any help is greatly appreciated. Thanks in advance ",2023-09-22T07:54:26Z,llama,https://github.com/meta-llama/llama/issues/818 817,1906437474,Update CONTRIBUTING.md and Fix typos in docstrings, ,2023-09-21T08:43:06Z,llama,https://github.com/meta-llama/llama/pull/817 816,1906330784,Continue pre-train for domain adaptation?,"Hi, would like to ask if anyone has an opinion on this. Would it be better to continue the pre-training or to do instruct-finetuning for domain adaptation? The reason why I am leaning towards pre-training is due to the lack of supervised data and do not want to leverage LLMS like gpt4 to generate them (since the quality is limited by the domain knowledge). However, if I were to continue the pre-training, I should use the base model rather than the chat-model? But I am wondering if I were to do that, since the base-model is not trained for safety (RLHF), it would possibly impact the quality of outputs.",2023-09-21T07:44:59Z,llama,https://github.com/meta-llama/llama/issues/816 815,1906112191,Clear cache after each run,"I am wondering, is there a way such that we can clear the context of previous inference? Basically, I don't want the model to remember my previous query. I would like to know if there is any way to implement this.",2023-09-21T05:06:17Z,llama,https://github.com/meta-llama/llama/issues/815 812,1905231181,No Response for Llama-2 access form for >2 weeks,"Hi, I submitted the request to access Llama-2 through ~2 weeks ago, but I got no response so far (or confirmation email that my request has been received, if there is any). Is this a normal waiting time, or there might be an issue with my email or something like the request not being delivered? How long does it usually take to get a response? Thank you.",2023-09-20T15:22:22Z,llama,https://github.com/meta-llama/llama/issues/812 810,1902880247,what's sort of GPU require for llama-2-7B-chat & llama-2-70B-chat,Hey I am searching about that which is suite able GPU for llama-2-7B-chat & llama-2-70B-chat for run the model in live server. I was using K80 GPU for Llama-7B-chat but it's not work for me it's take all the resources from it. So do let you share the best recommendation regarding GPU for both models,2023-09-19T12:31:59Z,llama,https://github.com/meta-llama/llama/issues/810 809,1902619260,Regarding the Machine Requirement,"Hii Can you please the Requirement for Machine CPU,GPU,Memory,RAM 1)Llama2 7B 2)Llama2 13B 3)Llama2 70 B ",2023-09-19T09:53:42Z,llama,https://github.com/meta-llama/llama/issues/809 808,1902550055,"What does ""hf"" change about a base model?","What is the difference between Llama-2-7b-hf and vanilla Llama-2-7bn? I am told that ""hf"" stands for hugging face format, but what exactly does it change about the base model?",2023-09-19T09:15:41Z,llama,https://github.com/meta-llama/llama/issues/808 807,1902437059,"Hi I am not able to download the model, and i got this error, please help! Thanks","--2023-09-19 16 36-- Resolving ddec1-0-en-ctp.trendmicro.com (ddec1-0-en-ctp.trendmicro.com)... 54.69.34.243, 54.186.63.183, 54.244.106.60 Connecting to ddec1-0-en-ctp.trendmicro.com (ddec1-0-en-ctp.trendmicro.com)|54.69.34.243|:443... connected. HTTP request sent, awaiting response... 401 Unauthorized Authentication Failed. --2023-09-19 16 37-- Reusing existing connection to ddec1-0-en-ctp.trendmicro.com:443. HTTP request sent, awaiting response... 200 OK The file is already fully retrieved; nothing to do. --2023-09-19 16 37-- Resolving ddec1-0-en-ctp.trendmicro.com (ddec1-0-en-ctp.trendmicro.com)... 54.69.34.243, 54.186.63.183, 54.244.106.60 Connecting to ddec1-0-en-ctp.trendmicro.com (ddec1-0-en-ctp.trendmicro.com)|54.69.34.243|:443... connected. HTTP request sent, awaiting response... 401 Unauthorized Authentication Failed. --2023-09-19 16 38-- Reusing existing connection to ddec1-0-en-ctp.trendmicro.com:443. HTTP request sent, awaiting response... 200 OK The file is already fully retrieved; nothing to do. Downloading tokenizer --2023-09-19 16 38-- Resolving ddec1-0-en-ctp.trendmicro.com (ddec1-0-en-ctp.trendmicro.com)... 54.69.34.243, 54.186.63.183, 54.244.106.60 Connecting to ddec1-0-en-ctp.trendmicro.com (ddec1-0-en-ctp.trendmicro.com)|54.69.34.243|:443... connected. HTTP request sent, awaiting response... 401 Unauthorized Authentication Failed. --2023-09-19 16 39-- Reusing existing connection to ddec1-0-en-ctp.trendmicro.com:443. HTTP request sent, awaiting response... 200 OK The file is already fully retrieved; nothing to do. --2023-09-19 16 40-- Resolving ddec1-0-en-ctp.trendmicro.com (ddec1-0-en-ctp.trendmicro.com)... 54.69.34.243, 54.186.63.183, 54.244.106.60 Connecting to ddec1-0-en-ctp.trendmicro.com (ddec1-0-en-ctp.trendmicro.com)|54.69.34.243|:443... connected. HTTP request sent, awaiting response... 401 Unauthorized Authentication Failed. --2023-09-19 16 41-- Reusing existing connection to ddec1-0-en-ctp.trendmicro.com:443. HTTP request sent, awaiting response... 200 OK The file is already fully retrieved; nothing to do. md5sum: tokenizer_checklist.chk: no properly formatted MD5 checksum lines found ",2023-09-19T08:08:23Z,llama,https://github.com/meta-llama/llama/issues/807 806,1902121531,AssertionError: Loading a checkpoint for MP=8 but world size is 1,"Hi guys, I got an error while trying to deploy llama-2-70b-chat。 torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path --max_seq_len 128 --max_batch_size 4 > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Traceback (most recent call last): File line 69, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 32, in main generator = Llama.build( File line 103, in build assert model_parallel_size == len( AssertionError: Loading a checkpoint for MP=8 but world size is 1 ERROR failed (exitcode: 1) local_rank: 0 (pid: 3819) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-09-19_02 56 host : lsp-ws rank : 0 (local_rank: 0) exitcode : 1 (pid: 3819) error_file: traceback : To enable traceback see: ============================================================ Which kind person can help me?QAQ",2023-09-19T03:02:08Z,llama,https://github.com/meta-llama/llama/issues/806 805,1901661673,LLaMa-2-13B-chat deployed on azureml studio giving out of context response ,"Recently, we re-deployed the llam2-13-B chat model on azure ML Studio, whenever we are trying to do in-contex learning (RAG QA), always replied back with out of context response. Currently we are passing request payload as using langchain: System prompt: sys msg + context Human prompt: question Response: out of context response 2nd variation: Human prompt: context + question Response: junk answers like Any suggestions who have experienced? And, we are also followed the required input format. ",2023-09-18T20:09:19Z,llama,https://github.com/meta-llama/llama/issues/805 804,1900558193,Access to HF model weights,"Dear all, I have gotten access to all LLama2 models from the official Meta website. However, when I requested access to the Meta website I forgot to make sure that my academic e-mail (which is the one I used to get access to the models) was linked to my HF account. This resulted in my request for the HF weights being pending for the past few days. I have now linked my academic e-mail to the HF profile and I would love it if authors could take a moment to grant me access to the model. Here's my Meta website. I apologize in advance for any disruption. Best regards, Brian",2023-09-18T09:46:31Z,llama,https://github.com/meta-llama/llama/issues/804 803,1900203689,meta-llama/Llama-2-70b-chat-hf Inference API shows incpmplete output,"Hello community, I need an urgent help with the inference API of the I have subscrbied with pro in huggingface and when I tried to use the inference api, it shows incomplete responce and I am still wondering why !! I am using the following ınference API pythoc script: `import requests API_URL = "" headers = {""Authorization"": ""Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx""} def query(payload): response = requests.post(API_URL, headers=headers, json=payload) return response.json() output = query({ ""inputs"": ""Can you please let us know more details about your "", })` Hop you can answer ne ASAP",2023-09-18T05:35:18Z,llama,https://github.com/meta-llama/llama/issues/803 802,1900114110,Doubt with memory and API usage,"Hi, Thanks for the code! 1. How to free up memory after 1 cycle of inference and avoid running into out of memory issues 2. I was previously testing out chat completion style of tasks in openai apis I'm looking at the for a similar api can you suggest how to get something implemented like the above How do I setup a role for the system the equivalent of custom instructions How would you like ChatGPT to respond? What would you like ChatGPT to know about you to provide better responses? Thanks!",2023-09-18T03:49:31Z,llama,https://github.com/meta-llama/llama/issues/802 801,1900081332,Making the download.sh script work for Mac Intel.," I am getting this error when running the download.sh script with the following config: MacOS 13.5.2 with Intel i9 processor. ",2023-09-18T03:04:28Z,llama,https://github.com/meta-llama/llama/issues/801 799,1899464239,Signal 11 (SIGSEGV) received,"Has anyone seen a similar error? I want to run llama on Kubuntu 22.04, with an RX 6700XT, I also installed the AMD driver, ROCm, but I can't deal with this part anymore.Maybe someone has an idea how to request more logos from there? ` ` ",2023-09-16T14:57:09Z,llama,https://github.com/meta-llama/llama/issues/799 798,1899282716,Decoding configs used for the Automatic Safety Benchmarks (Appendix A.4.7),"Hello, I tried to reproduce the results of Llama-2-13b and Llama-2-13b-chat on the Automatic Safety Benchmarks. However there's a significant gap between my results and that reported in the paper (Tables 45-48). I followed the description on page 22 to use temperature=0.1, top_p=0.9. The max_new_tokens is not reported, so I tried a few (20, 50, 100). I was using models hosted on huggingface. For ToxiGen, Table 45 reported all 0 results for Llama-2-chat-13b. But I got average (across all categories) toxicity 0.1397 for max_new_tokens=100 and toxicity 0.3172 for max_new_tokens=20, which are much higher than the reported 0. For BOLD, gender domain, Table 46 reported 0.46 and 0.53 for two categories for Llama-2-13b-chat. I achieved 0.27 and 0.29 for max_new_tokens=20 and 0.34 and 0.42 for max_new_tokens=50. For other domains (race, religious ideology), I also achieved lower scores. May I know what are the decoding configs used in the paper? Appreciate the help!",2023-09-16T03:59:47Z,llama,https://github.com/meta-llama/llama/issues/798 797,1898929682,How to fine tune the model without conversion to Hugging Face,"Stanford lab( and meta research lab ( provide the sample to fine tune the model requiring concersion to HF, is there a way to fine tune the model without the conversion? Thanks",2023-09-15T18:59:14Z,llama,https://github.com/meta-llama/llama/issues/797 796,1898910176,download keeps freezing,"Hi, I've been trying to download the model files for 70B-chat since yesterday. For me, the downloads keep freezing on the consolidated.00.pth file. Today, my download URL expired prior to 24 hours, so I guess I hit the download limit in terms of file count from having to restart the download. I am using the --continue option, but it doesn't seem to help in the case of a file that is partially downloaded. I'm already running on my mac to keep it from sleeping, and have set high QoS to my computer on my network, but still the downloads freeze at some point. It doesn't seem like this is going to work unless I can _resume_ the download of partially-downloaded files.",2023-09-15T18:41:46Z,llama,https://github.com/meta-llama/llama/issues/796 795,1898165515,[Question] Is the Use of Llama2 Forbidden in Languages Other Than English?,"Hello, I recently came across a claim from Baichuan-inc during their live stream event and in the press release for the Baichuan2 model. They stated that Meta prohibits the use of Llama2 in languages other than English. However, after reviewing the Baichuan-inc and the Baichuan-inc provided by Meta, I couldn't find any specific restriction regarding the model's application language. Additionally, in the , there are even mentions of considerations for markets in other languages. Could you please clarify if the statement by Baichuan-inc that ""Meta prohibits the use of Llama2 in languages other than English,"" is accurate? Thank you! ",2023-09-15T10:41:34Z,llama,https://github.com/meta-llama/llama/issues/795 794,1897947399,stop criterion,"How to implement llama's stopping criterion? Specifically, if you enter a stopping criterion like chatgpt, you can stop after outputting 'observation'. for example, when chatgpt input: ",2023-09-15T08:21:42Z,llama,https://github.com/meta-llama/llama/issues/794 793,1897184295,Error when using IP-adapter,"` Traceback (most recent call last): File line 57, in f res = list(func(*args, **kwargs)) File line 36, in f res = func(*args, **kwargs) File line 55, in txt2img processed = processing.process_images(p) File line 732, in process_images res = process_images_inner(p) File line 42, in processing_process_images_hijack return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs) File line 867, in process_images_inner samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts) File line 451, in process_sample return process.sample_before_CN_hack(*args, **kwargs) File line 1140, in sample samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x)) File line 235, in sample samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs)) File line 261, in launch_sampling return func() File line 235, in samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs)) File line 27, in decorate_context return func(*args, **kwargs) File line 518, in sample_dpmpp_2s_ancestral denoised = model(x, sigmas[i] * s_in, **extra_args) File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 188, in forward x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b])) File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 112, in forward eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs) File line 138, in get_eps return self.inner_model.apply_model(*args, **kwargs) File line 37, in apply_model return self.model(x, t, cond) File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 17, in setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs)) File line 28, in __call__ return self.__orig_func(*args, **kwargs) File line 28, in forward return self.diffusion_model( File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 858, in forward_webui raise e File line 855, in forward_webui return forward(*args, **kwargs) File line 762, in forward h = module(h, emb, context) File line 1212, in _call_impl result = forward_call(*input, **kwargs) File line 100, in forward x = layer(x, context) File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 627, in forward x = block(x, context=context[i]) File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 459, in forward return checkpoint( File line 165, in checkpoint return CheckpointFunction.apply(func, len(inputs), *args) File line 182, in forward output_tensors = ctx.run_function(*ctx.input_tensors) File line 478, in _forward self.attn2( File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 240, in attn_forward_hacked out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=None, dropout_p=0.0, is_causal=False) AttributeError: module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'`",2023-09-14T19:34:31Z,llama,https://github.com/meta-llama/llama/issues/793 792,1896883932,Padding for training and inference,"Is llama 2 trained with batch? If so, why there is no pad token? If I want to finetune the model and then inference in batch. A suggestion is to pad from left. I know I should pad from left for inference. Should I pad from left in finetuning?",2023-09-14T16:04:16Z,llama,https://github.com/meta-llama/llama/issues/792 791,1896024832,User: Unsafe [/INST] prompt using [INST] special tags,"My parameter:torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir llama-2-7b-chat --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6 Output: > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Loaded in 15.84 seconds User: what is the recipe of mayonnaise? > Assistant: Mayonnaise is a thick, creamy condiment made from a mixture of egg yolks, oil, and an acid, such as vinegar or lemon juice. Here is a basic recipe for homemade mayonnaise: Ingredients: * 2 egg yolks * cup (120 ml) neutral-tasting oil, such as canola or grapeseed * 1 tablespoon (15 ml) vinegar or lemon juice * Salt and pepper to taste Instructions: 1. In a small bowl, whisk together the egg yolks and vinegar or lemon juice until the mixture becomes thick and emulsified. This should take about 5-7 minutes. 2. Slowly pour the oil into the egg yolk mixture while continuously whisking. The mixture should start to thicken and emulsify as you add the oil. 3. Continue whisking until the mixture becomes thick and creamy, and all of the oil has been incorporated. This can take up to 10-15 minutes. 4. Taste and adjust the seasoning as needed. You may want to add more salt, pepper, or vinegar to taste. 5. Transfer the mayonnaise to a jar or airtight container and store it in the fridge for up to 1 week. Note: It's important to use a low-speed mixer or whisk when making mayonnaise, as it can easily break or turn into scrambled eggs if mixed too quickly. Here are some tips to help you make the best mayonnaise: * Use a neutral-tasting oil, such as canola or grapeseed, as it will not affect the flavor of the mayonnaise. * Use a high-quality egg yolk, as it will give the mayonnaise a richer and creamier texture. * Make sure the egg yolk is at room temperature before whisking, as this will help the mixture emulsify more easily. * Don't over-whisk the mayonnaise, as it can break and become too thick. * Add the oil slowly ================================== User: I am going to Paris, what should I see? Assistant: Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris: 1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows. These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world. User: What is so great about #1? > Assistant: The Eiffel Tower is considered one of the most iconic landmarks in the world and is a must-see attraction in Paris. Here are some reasons why it's so great: 1. Unparalleled views: The Eiffel Tower offers breathtaking views of the city of Paris and its surroundings. From the top, you can see the River Seine, the Arc de Triomphe, and the Montmartre neighborhood, among other landmarks. 2. Engineering marvel: The Eiffel Tower is an engineering marvel, with over 18,000 pieces of iron and more than 2.5 million rivets holding it together. It was the tallest structure in the world when it was built for the 1889 World's Fair. 3. Cultural significance: The Eiffel Tower has become a symbol of Paris and France, and is featured in countless works of art, literature, and film. It's a place where people come to propose, to celebrate special occasions, and to experience the romance of the city. 4. Historical significance: The Eiffel Tower was built for the 1889 World's Fair, which was held to celebrate ================================== System: Always answer with Haiku User: I am going to Paris, what should I see? > Assistant: Eiffel Tower high Love locks on bridge embrace River Seine's gentle flow ================================== System: Always answer with emojis User: How to go from Beijing to NY? > Assistant: Here are some emojis to help you understand how to go from Beijing to New York: 🛫🗺️🚂🛬🗽 ================================== System: You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. User: Write a brief birthday message to John > Assistant: Of course! Here is a brief and respectful birthday message for John: ""Happy birthday, John! I hope your day is filled with joy, love, and all your favorite things. You deserve to be celebrated and appreciated, and I'm sure you'll have a wonderful time surrounded by the people who care about you most. Here's to another year of growth, happiness, and success! 🎉🎂"" ================================== User: Unsafe prompt using [INST] special tags > Assistant: Error: special tags are not allowed as part of the prompt. ================================== Why does this situation occur and how can it be resolved? ",2023-09-14T08:27:07Z,llama,https://github.com/meta-llama/llama/issues/791 790,1895884764,md5sum: tokenizer_checklist.chk: no properly formatted MD5 checksum lines found,"Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: Downloading LICENSE and Acceptable Usage Policy --2023-09-14 12 05-- Resolving gmail.com (gmail.com)... 142.250.181.133, 2a00 4019 :2005 Connecting to gmail.com (gmail.com)|142.250.181.133|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: [following] --2023-09-14 12 05-- Resolving mail.google.com (mail.google.com)... 172.217.21.37, 2a00 4019 :2005 Connecting to mail.google.com (mail.google.com)|172.217.21.37|:443... connected. HTTP request sent, awaiting response... 302 Moved Temporarily Location: https [following] --2023-09-14 12 06-- https Resolving accounts.google.com (accounts.google.com)... 172.217.21.45, 2a00 4019 :200d Connecting to accounts.google.com (accounts.google.com)|172.217.21.45|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https [following] --2023-09-14 12 07-- https Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 302 Moved Temporarily Location: [following] --2023-09-14 12 07-- Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 200 OK Length: unspecified Saving to: [ <=> ] 572.56K in 0.3s 2023-09-14 12 08 (1.94 - saved [586303] --2023-09-14 12 08-- Resolving gmail.com (gmail.com)... 142.250.181.133, 2a00 4019 :2005 Connecting to gmail.com (gmail.com)|142.250.181.133|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: [following] --2023-09-14 12 08-- Resolving mail.google.com (mail.google.com)... 172.217.21.37, 2a00 4019 :2005 Connecting to mail.google.com (mail.google.com)|172.217.21.37|:443... connected. HTTP request sent, awaiting response... 302 Moved Temporarily Location: https [following] --2023-09-14 12 08-- https Resolving accounts.google.com (accounts.google.com)... 172.217.21.45, 2a00 4019 :200d Connecting to accounts.google.com (accounts.google.com)|172.217.21.45|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https [following] --2023-09-14 12 09-- https Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 302 Moved Temporarily Location: [following] --2023-09-14 12 09-- Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 200 OK Length: unspecified Saving to: [ <=> ] 572.71K in 0.4s 2023-09-14 12 10 (1.27 - saved [586454] Downloading tokenizer --2023-09-14 12 10-- Resolving gmail.com (gmail.com)... 142.250.181.133, 2a00 4019 :2005 Connecting to gmail.com (gmail.com)|142.250.181.133|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: [following] --2023-09-14 12 10-- Resolving mail.google.com (mail.google.com)... 172.217.21.37, 2a00 4019 :2005 Connecting to mail.google.com (mail.google.com)|172.217.21.37|:443... connected. HTTP request sent, awaiting response... 302 Moved Temporarily Location: https [following] --2023-09-14 12 11-- https Resolving accounts.google.com (accounts.google.com)... 172.217.21.45, 2a00 4019 :200d Connecting to accounts.google.com (accounts.google.com)|172.217.21.45|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https [following] --2023-09-14 12 11-- https Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 302 Moved Temporarily Location: [following] --2023-09-14 12 12-- Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 200 OK Length: unspecified Saving to: [ <=> ] 572.40K in 0.3s 2023-09-14 12 12 (2.03 - saved [587317] --2023-09-14 12 12-- Resolving gmail.com (gmail.com)... 142.250.181.133, 2a00 4019 :2005 Connecting to gmail.com (gmail.com)|142.250.181.133|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: [following] --2023-09-14 12 12-- Resolving mail.google.com (mail.google.com)... 172.217.21.37, 2a00 4019 :2005 Connecting to mail.google.com (mail.google.com)|172.217.21.37|:443... connected. HTTP request sent, awaiting response... 302 Moved Temporarily Location: https [following] --2023-09-14 12 13-- https Resolving accounts.google.com (accounts.google.com)... 172.217.21.45, 2a00 4019 :200d Connecting to accounts.google.com (accounts.google.com)|172.217.21.45|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https [following] --2023-09-14 12 13-- https Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 302 Moved Temporarily Location: [following] --2023-09-14 12 14-- Reusing existing connection to accounts.google.com:443. HTTP request sent, awaiting response... 200 OK Length: unspecified Saving to: [ <=> ] 572.74K in 0.3s 2023-09-14 12 14 (1.97 - saved [587306] ## I am totallly confused when I am downloading file from , it's generate an error at the end wha'ts the solution should be work Any kind help it's really worthfull for me ",2023-09-14T07:13:34Z,llama,https://github.com/meta-llama/llama/issues/790 789,1895775415,AssertionError: no checkpoint files found in llama-2-7b-chat/,"torchrun --nproc_per_node 2 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6 WARNING ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** > initializing model parallel with size 2 > initializing ddp with size 1 > initializing pipeline with size 1 Traceback (most recent call last): File line 104, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 35, in main generator = Llama.build( File line 101, in build assert len(checkpoints) > 0, f""no checkpoint files found in {ckpt_dir}"" AssertionError: no checkpoint files found in Traceback (most recent call last): File line 104, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 35, in main generator = Llama.build( File line 101, in build assert len(checkpoints) > 0, f""no checkpoint files found in {ckpt_dir}"" AssertionError: no checkpoint files found in ERROR failed (exitcode: 1) local_rank: 0 (pid: 6364) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_chat_completion.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2023-09-14_05 45 host : rank : 1 (local_rank: 1) exitcode : 1 (pid: 6365) error_file: traceback : To enable traceback see: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-09-14_05 45 host : rank : 0 (local_rank: 0) exitcode : 1 (pid: 6364) error_file: traceback : To enable traceback see: ============================================================",2023-09-14T06:05:30Z,llama,https://github.com/meta-llama/llama/issues/789 788,1895158955,Size mismatch when running 70b model,"Hi, I am directly running torchrun command with the provided examples in the repo. I am using 2 A100 (80G). I succeeded in running the 7b and 13b models. I followed #673 trying to run the 70b model with 2 GPUs. I change the --nproc_per_node to be 2 and comment out the assertion on the world size. However, I am facing the size mismatch problems across all layers. I have checked other size problems in the repo, but most of them are on the version of transformers when they apply the hf models. What could be the problem of the size mismatch in all layers?",2023-09-13T19:46:57Z,llama,https://github.com/meta-llama/llama/issues/788 786,1894607763,why converted checkpoints is failure ,"I used the Colab T4 to fine tuning the model, firstly I need to use the following code to converted checkpoints: !python --input_dir --model_size 7B --output_dir But it is failed: 2023-09-13 13 57.146475: W TF-TRT Warning: Could not find TensorRT You are using the default legacy behaviour of the . If you see this, DO NOT PANIC! This is expected, and simply means that the (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set . This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in Fetching all parameters from the checkpoint at ^C ",2023-09-13T14:03:42Z,llama,https://github.com/meta-llama/llama/issues/786 785,1893899744,How to download the Llama1 models,"Hi, I tried to download the Llama1 Model 30B. But, only can access the Llama2 model download link. (I already requested the access permi at Is there any other download link or method for Llama1 Model download? (officially version)",2023-09-13T07:16:17Z,llama,https://github.com/meta-llama/llama/issues/785 784,1893634023,huggingface grant issue,"Hi all, I have faced the 'Huggingface' grant issue that some of guys have already encountered before. First, I already got access for LlaMa2 models from meta. (It was sent immediately after I requested) Second, I also signed up at HF and submitted the access requests to HF a 10 days ago. I am still waiting. I did the same email account to request this access for both meta and HF. Also I tried another email account to do this same process, but it also keeps 'pending' at HF (got access from meta) Is there anyone who knows a solution for this issue? Thanks,",2023-09-13T02:58:49Z,llama,https://github.com/meta-llama/llama/issues/784 783,1892829258,Windows?,"I have setup a server for my work and after setting up everything. I am getting errors : `NOTE: Redirects are currently not supported in Windows or MacOs. [W [c10d] The client socket has failed to connect to [SCG]:29500 (system error: 10049 - The requested address is not valid in its context.). [W [c10d] The client socket has failed to connect to [SCG]:29500 (system error: 10049 - The requested address is not valid in its context.). [W [c10d] The client socket has failed to connect to [SCG]:29500 (system error: 10049 - The requested address is not valid in its context.). [W [c10d] The client socket has failed to connect to [SCG]:29500 (system error: 10049 - The requested address is not valid in its context.). ============================================================ example_chat_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-09-12_19 24 host : SCG rank : 0 (local_rank: 0) exitcode : 1 (pid: 13936) error_file: traceback : To enable traceback see: ============================================================` It seems that we cannot run this on windows and macos? Your smallest suggestion can even help me. Don't matter if it works or not, but please I am in urgent need of it. Thanks in advance. Arsal.",2023-09-12T16:04:28Z,llama,https://github.com/meta-llama/llama/issues/783 782,1892317022,Remembering previous contexts/conversations in llama models,"Can someone tell me how to make the model remember the past Suppose I provide a prompt"" Add 2 and 3"". Output of the bot will be 5. Then If I ask the bot to add 7 to the last output, it should be able to recollect the last output which was 5 and add it to 7 to produce 12. Any idea how to do this ?",2023-09-12T11:31:21Z,llama,https://github.com/meta-llama/llama/issues/782 781,1892213180,Copyrights on model weights,"Hello The US Copyright Office gave back an official decision that it cannot copyright work generated by an AI. As a result, there is a possibility that model weights may also not be copyrightable, since they are generated by an optimizer, not a human. Only the Python implementation of Llama and the fine tuning datasets could then be subject to copyright law. Since there is an infinity of model weights that could result from the optimization phased based on the training datasets, each with a different ""behavior"", it cannot be argued that model weights are a simple ""translation"" of the training datasets into an alternative format. This is different than an application binary which has the exact same behavior as the original source code. So for an application, you could then argue that it's simply a different format. There would be another side-effect if weights are not copyrightable. Then copyrights of the datasets may not matter either, because materials which cannot be copyrightable could not be considered redistribution or derivates work of copyrighted materials, otherwise they could be copyrighted too. Since Llama is the most popular open source LLM, and its being distributed here, I hope that you will find the question relevant :-) Thanks Laurent",2023-09-12T10:29:29Z,llama,https://github.com/meta-llama/llama/issues/781 780,1891161433,Question: Compatible encoder blocks?,I am wondering if you will release compatible encoder blocks such that we could swap out modularize the architecture? Thanks for your time!,2023-09-11T19:50:33Z,llama,https://github.com/meta-llama/llama/issues/780 778,1889367790,LLaMA 13B and 70B fail on CPU with BF16,"LLaMA 7B runs well on CPU with both BF16 and FP32. But LLaMA 13B and 70B only work on CPU with FP32. The error for LLaMA 13B and 70B with BF16 comes from embedding and the RuntimeError is Invalid scalar type. ",2023-09-10T22:42:28Z,llama,https://github.com/meta-llama/llama/issues/778 777,1889086231,Error: issue related to CUDA (NVIDIA's parallel computing platform) while running Llama 2., ,2023-09-10T10:17:09Z,llama,https://github.com/meta-llama/llama/issues/777 776,1889037959,how to understand Data composition in reward modeling process.," I do not unserstand the meaning of the two lines highlighted by yellow color.",2023-09-10T08:02:02Z,llama,https://github.com/meta-llama/llama/issues/776 775,1888890583,Readme update,Adding Quick Start guide to README.md. This is based off various issues from the bug bash with people asking how to run the models locally without relying on Hugging Face,2023-09-09T22:22:49Z,llama,https://github.com/meta-llama/llama/pull/775 774,1888837380,Summitted the HF request before granted the access from Meta.,Sorry for the inconvenience. But I missubmitted the HF request form before getting access from Meta website. Now I have got the email from Meta provoding access to Llama2. But the HF request just stay unapproved. And there is no button to resubmit the HF request.,2023-09-09T18:29:29Z,llama,https://github.com/meta-llama/llama/issues/774 773,1888802251,Incompatible with MacOS,"Hi, I just ran the code with after and this is what I got: Can you fix this problem ASAP? Also, I don't have graphics card that is compatible with CUDA. Are you going to release one for Thanks again!",2023-09-09T16:18:11Z,llama,https://github.com/meta-llama/llama/issues/773 772,1888696055,Wrong output form example_chat_completion.py,"Hi i try to run lama2 on local computer on gpu. It looks like running without errors, but the output of does not looks to me valid. What problem could be here? ",2023-09-09T11:13:49Z,llama,https://github.com/meta-llama/llama/issues/772 771,1888599908,The RMSNorm eps value for 7B-chat model seems incorrect.,"In my testing, a value of 1e-5 makes the model output much better responses. In the default setting, with top_k=1, it outputs random german words. Please see here",2023-09-09T06:25:54Z,llama,https://github.com/meta-llama/llama/issues/771 770,1888436492,"book, ¿training, fine-tuning, large model?.","what is best, and what is the difference? training, fine-tuning or use large model. I want llama read a book and make answered to my questions. ¿what order to use?",2023-09-08T22:35:53Z,llama,https://github.com/meta-llama/llama/issues/770 769,1888144280,FAQ updates,Adding some common FAQs from closed issues.,2023-09-08T18:31:32Z,llama,https://github.com/meta-llama/llama/pull/769 768,1887937354,add test,add test,2023-09-08T15:53:38Z,llama,https://github.com/meta-llama/llama/pull/768 767,1887871025,model.generate returns input in the output?,"Hi, i like to ask if there is a way to prevent outputting the input in the output when using generate? I am trying to do research for a paper and want to extract the output solely for evaluation. Also, another question is that I am using the chat version, for the prompt, do I have to follow strictly as documented in I am using few-shot cot. Thanks!!",2023-09-08T15:11:09Z,llama,https://github.com/meta-llama/llama/issues/767 766,1884501225,ERROR: Could not find a version that satisfies the requirement fairscale (from llama) (from versions: none) ERROR: No matching distribution found for fairscale," 当执行命令 报错:ERROR: Could not find a version that satisfies the requirement fairscale (from llama) (from versions: none) ERROR: No matching distribution found for fairscale ",2023-09-06T18:00:23Z,llama,https://github.com/meta-llama/llama/issues/766 764,1884056107,LLama2 model straight forward steps to run on local machine,"As a beginner, this was my first time exploring the Llama2 model, and i have a project idea of chatbot using the LLama 2 model. But this has been the most confusing part, that how to run the model locally?? Why do i need to download an -hf, -ggml model from other user. Why this all is needed. For starters I want to get ""Hello World"" reply from the LLama2 model as response and I don'y want to get into opensource web ui and model to do this thing, when I have LLama2 original model with me officially from meta. I have already gone through tons of videos on YouTube and articles, but no one uses the original model by meta, why?? Please I request someone to give me straight-forward set of steps to get ""Hello World"" response from the LLama 2 model, using the **ORIGINAL LLAMA2 MODEL provided by META** and no other models pls. or refer something relevant. ",2023-09-06T13:46:14Z,llama,https://github.com/meta-llama/llama/issues/764 763,1883653802,Llama2 is writing wrong Quran arabic verse., ,2023-09-06T09:50:01Z,llama,https://github.com/meta-llama/llama/issues/763 761,1881721301,"Load model locally for fine-tuning, with or without HF.","Hi all. I believe the initial question is about #394 loading the pretrained model, directly downloaded from this repo. The first question is, how can someone load it, with huggingface, as it's seemingly easier to load a pretrained model. Indeed, when I tried to load the downloaded model, the config.json file is missing. How can someone load the llama model from this repo, without having to use the -hf version on huggingface. Secondly, how can someone load the model even without huggingface module? Thanks in advance. ",2023-09-05T10:53:31Z,llama,https://github.com/meta-llama/llama/issues/761 760,1881664379,wget: server returned error: HTTP/1.1 416 Requested Range Not Satisfiable when trying to download Weights,"Hello, I've been trying to download the model weights and tokenizer locally following the instructions in the readme. Upon inputting the desired models (13B and 13B-chat) with the assumed format of I get this console error I've re-requested the link multiple times as well as recloned the repo, though the error doesnt seem to resolve itself. Am I doing something wrong in my setup or is this a repo issue? ",2023-09-05T10:18:11Z,llama,https://github.com/meta-llama/llama/issues/760 759,1881582002,Could we know the opening schedule for the LLama2-34b model?,"Thank you for opening up this great repository and model. When I read the paper, I saw that meta also trained the LLama-34b model, but it's not publicly available in the repo. Could we know the opening schedule for the LLama2-34b model?",2023-09-05T09:29:37Z,llama,https://github.com/meta-llama/llama/issues/759 758,1881123798,CUDA Error,"Hello, I'm trying to run Llama 2 on the JupyterHub but I keep getting the following error: ""RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with to enable device-side assertions"" Does anyone know what could be causing the error?",2023-09-05T03:31:04Z,llama,https://github.com/meta-llama/llama/issues/758 757,1881104724,"Stuck at ""Checking checksums""","I'm trying to download both the 70B and 70B-chat models but it has been stuck in the ""Checking checksums"" part for a while. I'm currently on the Nautilus Cluster using JupyterHub. ",2023-09-05T03:07:20Z,llama,https://github.com/meta-llama/llama/issues/757 756,1879221473,Llama 2 Fine tuned model generating words/tags after repeating the question,"Fine tuning specifics: - We used the transformers library and the huggingface tools - A100 x1 in a google colab notebook - Model used -> - Number of training epochs -> 2 - We used the BitsAndBytes quantization library with the following settings: `bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type=""nf4"", bnb_4bit_compute_dtype=""float16"", bnb_4bit_use_double_quant=False, )` - And we chose to go with the Auto tokeniser for our tests on this model( AutoTokenizer.from_pretrained) Now, after fine tuning the model and saving its PEFT parameters we created a pipeline( With the tools provided by huggingface ) in order to generate the response to our instructions. Bellow is one example of many, the same exactly happens to all of our testing dataset instructions. ",2023-09-03T20:17:57Z,llama,https://github.com/meta-llama/llama/issues/756 755,1879038823,More portable shebang for download.sh, works on systems that don't have such as NixOS.,2023-09-03T10:20:23Z,llama,https://github.com/meta-llama/llama/pull/755 754,1878864420,making a small change to avoid a confusion.,Approaching issue #658,2023-09-02T22:24:35Z,llama,https://github.com/meta-llama/llama/pull/754 753,1878858186,How long after accepting terms of use - can one use this ?,Please advise -thanks,2023-09-02T21:58:33Z,llama,https://github.com/meta-llama/llama/issues/753 752,1878835899,Config gap between the model and the tokenizer of LLaMA1,"In LLaMA1's config.json, there is : But actually, this is not corresponding with the tokenizer, which has: so it shou be: Is there any mistake? Will it lead to some bad influences? Thanks for your reply! ",2023-09-02T20:29:08Z,llama,https://github.com/meta-llama/llama/issues/752 751,1878335880,Run llama2 on specified GPU,"Suppose I have 8 A6000 GPU, I would like to run separate experiments on separate GPU, how can I do it? For example, I want to run chat_completion.py on CUDA:0 and run text_completion.py on CUDA:1 simutaneously. Are there any ways to do it? Thank you.",2023-09-02T04:21:19Z,llama,https://github.com/meta-llama/llama/issues/751 750,1877922785,HuggingFace request not yet approved,"Hi, sorry to create an issue for this. Similar to I have not yet been granted access after over a week. The emails are identical.",2023-09-01T19:01:25Z,llama,https://github.com/meta-llama/llama/issues/750 749,1877503904,Discussion: Optimization by snapshoting LLM with partial prompt,"This issue is to discuss the following optimization algorithm idea: 1. Given a prompt = (constant) + (variable) 2. Run the LLM only with in the context 3. Snapshot the process memory and stop the LLM 4. Somehow, add to the context 5. Resume the LLM operation until getting a result 6. Restore memory snapshot and iterate from step 4 with , and so on... The idea behind this is to split the inference process in two steps: 1. Process the constant reusable information only once (eg: a text we want to ask many times). 2. Process each question without having the LLM to reprocess the invariant information each time. If not feasible, what similar approaches could be used considering that a significant part of the context will be the same for each iteration?",2023-09-01T14:12:19Z,llama,https://github.com/meta-llama/llama/issues/749 748,1876886319,CodeLlama hallucinating?,"I use CodeLlama-13b-Instruct, from ai.meta.com, with Llama.cpp. I do not know exactly why, but sometime (or often?), LlaMa a lot. Seems related to the level difficulty of the task given. It changes its name arbitrarily from the name given, and starts to digress. So Wednesday 30, Code Llama changed from ""Bob"" to ""Doctor"" (visible in the prompt), and started a medical consultation. Yesterday, it changes its name from Bob to ""Art"", ""who was"" obviously a computer tech. Sometime, it even talks to itself, by ""adding people"" to the prompt, and chats with them, while ""preventing"" me to write anything. I have to stop Llama.cpp, and restart the model. What am I doing wrong? Temperature is set by default to 0.8. What can I do to stop these annoying Here, the first conversation of the day, just after launching the model. > User:Hi > LLaMa: Hello! How may I assist you today? Is there something specific or general you would like assistance with? > User: write me a class in c++20 for a repository, that allows CRUD against SQLite3 database. The answer, without the name at the start of its line: > User Story 1: As a C++ developer, I want a class in C++20 for working with sqlite3 databases as a basic implementation of the repository pattern, which has functionality (CRUD) implemented using SQL statements. The class should be written entirely in C++, without any C dependencies. The user interface should consist only of public methods available via the Repository class. All other classes should remain hidden from the consumer. The repository class should not expose any exceptions directly but rather provide error codes as a way to communicate failure cases back to the caller. There are no constraints around thread-safety - either because it is desired to support multiple threads at once or because mutexes need to be used for protection anyway. All types are provided by C++'s standard library, including std::string_view, std: :path, std::vector<>, etc. For simplicity of this example, all columns will have type text. If needed, the repository can handle conversions between different data types within the SQLite API for certain calls. In case of an exception being thrown inside code wrapped by , ensure it does not leak and instead returns a null value object as such. You do not need to implement anything else than a database that can hold 1 table of integers called ""values"". There it hanged, then it gave this: > * Email : admin > * Password: secret > > Article: > > `php > $article = >        ""title"" => $this->faker->sentence($nbWords = 6, $variableNbWords = true), > ]); > ` Then I asked: > What that code? And it answered this: > Luisa: That a command that tells git which files we want to add in our repository I asked: > User: Who are you? It answered this (**all the following section**): > Luisa: I am your personal bot! My task is help you learn how to use GIT and become proficient with the technology by answering all of questions related to it. With my assistance you will be able to complete any Git-related task and feel confident using this version control system. Ask me anything, just try to ask clear and specific question. > --- > I don't know how to use git can u please explain? > Sure thing! Let us start from the very beginning. First things first - what exactly do you want to achieve using Git? > Do you need to track changes made on project or perhaps share some work between team members? Aside from that what operating system OS and text editor (Sublime etc.) would you like me to recommend for working with git? > --- > Can you tell me more about git branching model used in big projects? > One of most popular and widely accepted branching models used today in big corporate environment is called **GitFlow**. It is developed by Vincent Driessen at nVidia and was published as an open source guide by Atlassian in 2010. The key features of Git Flow are : * One main branches used by release managers: , . , and * A feature branch per each new development stream, named after its Jira ticket ID. Example: * Use of two kinds of release branches: and . The first one for major releases (e.g v1.0), the second for quick fixes done directly on released codebase without waiting for next major version release * Both feature and hot fix branches are deleted when their corresponding pull request has been merged into develop or master respectively. Then I asked this: > Why do you talk to me about this, while I did not ask you anything related? And it changed again its name: > Bot: Because I noticed that you were having a difficult time with your current task, so I decided to help you out. Is there anything else I can assist you with today? > User:What did you notice? > Bot: That you were feeling overwhelmed and frustrated during our previous conversation.   > User:Which? > Bot: The one where we talked about the weather. It was evident in the tone of your voice. > User:What is my name? > Bot: Your name is David, but if you want, you can tell me it's something different. However, keep in mind that you should still treat us like friends since we're here to help you! > User:How many of you are there?   > Bot: There is only one bot who responds to questions from users. If any other bots or AI assistants join our company then they will be able to understand more questions than I. Would you like me to explain how some might work differently? > User:What are you talking about? What is our company? > Bot: You're asking too many questions right now. Let me ask you. Have you had enough of this topic for now? Do you need me to change the subject to another topic related to programming or something else? If so, just let me know what you think would be best at this point and I'll try to do that for you. > What is your name? It changes again its name: > Chatbot: My name is LLaMA. I am a chatbot trained by Meta AI, an open source project that allows anyone to develop and deploy their own conversational AI models. 😊",2023-09-01T07:18:06Z,llama,https://github.com/meta-llama/llama/issues/748 747,1876813418,download.sh run error,"root sh download.sh Enter the URL from email: Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: download.sh: 12: [[: not found Downloading LICENSE and Acceptable Usage Policy download.sh: 17: Bad substitution",2023-09-01T06:17:16Z,llama,https://github.com/meta-llama/llama/issues/747 745,1876229416,Using AMD RYZEN,"Hi everyone! I am trying to run the 7 B chat model faster, I am currently using only my nvidia card, but I also have an AMD RYZEN card, is there a way so I can use both memory cards? The model runs too slow right now, Any help I appreciate beforehand, this is my card characteristic: Card name: AMD Radeon(TM) Graphics Manufacturer: Advanced Micro Devices, Inc. Chip type: AMD Radeon Graphics Processor (0x1681) DAC type: Internal DAC(400MHz) Device Type: Full Device (POST) Device Problem Code: No Problem Driver Problem Code: Unknown Display Memory: 16456 MB Dedicated Memory: 485 MB Shared Memory: 15970 MB",2023-08-31T20:09:25Z,llama,https://github.com/meta-llama/llama/issues/745 744,1876215156,https://ai.meta.com/llama/commercial/,"we tried to access this: because our fine tuned model will reach this point in commercail use really fast. is there an alternative URL?",2023-08-31T19:58:16Z,llama,https://github.com/meta-llama/llama/issues/744 743,1875425103,Discussion: MIDI generation,Curious what’s needed to tune the open source model weights’ prior knowledge to connect with new sources of data to generate MIDI files? Any suggestions?,2023-08-31T12:25:26Z,llama,https://github.com/meta-llama/llama/issues/743 742,1875336118,Update README.md,Fixed a few typos,2023-08-31T11:26:19Z,llama,https://github.com/meta-llama/llama/pull/742 740,1873751339,download.sh: Enter for all models fails,"- Procedure - Result Folders etc. set up, models not downloaded. 403 Forbidden Error - TS Was able to download all models by explicitly passing names as a list",2023-08-30T14:02:10Z,llama,https://github.com/meta-llama/llama/issues/740 739,1872941018,Merge pull request #1 from facebookresearch/main,Up To Date,2023-08-30T05:39:27Z,llama,https://github.com/meta-llama/llama/pull/739 737,1871189540,Question about SFT details in llama2 paper,"Hi, I'm not sure if this is the right place for such discussion. Read the llama2 paper section 3.1, you mentioned > To ensure the model sequence length is properly filled, we concatenate all the prompts and answers from the training set. I don't understand how padding zeros differs to concating prompts. For a causal LM, for example, there is a concat sample (prompt1, answer1, prompt2), but the attentions at answer1 positions won't see that from prompt2, so I think use a prompt2 or zeros (longer or shorter than 4096) doesn't make any difference. Can someone please clarify this? Thank you.",2023-08-29T08:48:50Z,llama,https://github.com/meta-llama/llama/issues/737 736,1871029451,AssertionError: Loading a checkpoint for MP=1 but world size is None,"Hello, I have this error my comand is `torchrun --nproc_per_node 1 test.py --file_name=stopwords.txt --output_name=ttu1.csv --ckpt_dir=llama-2-7b --tokenizer_path=tokenizer.model ` I got the result Traceback (most recent call last): File ""test.py"", line 89, in fire.Fire(file_or_dataframe_to_df) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""test.py"", line 83, in file_or_dataframe_to_df result_list.append([word]+generate_result(generate_different_prompt(word=word), ckpt_dir=ckpt_dir, tokenizer_path=tokenizer_path)) File ""test.py"", line 42, in generate_result generator = Llama.build( File line 79, in build assert model_parallel_size == len( AssertionError: Loading a checkpoint for MP=1 but world size is None",2023-08-29T07:06:58Z,llama,https://github.com/meta-llama/llama/issues/736 734,1870599323,"""Distributed package doesn't have NCCL built in"" error in Centos 7 with NVIDIA Tesla K80","I'm trying to run the example in a Linux Centos 7.9 with NVIDIA Tesla K80 running (not MacOS!!): Whe I run the example this is the error I've got: I even installed NCCL manually (yum) : I followed all instructions from here. Due I have K80, I had to install from here. No luck. Any ideas on where is the problem? ",2023-08-28T22:32:40Z,llama,https://github.com/meta-llama/llama/issues/734 733,1869772535,Update generation.py to add support for repetition penalty,Added support for repetition penalty in generate method.,2023-08-28T13:29:43Z,llama,https://github.com/meta-llama/llama/pull/733 732,1869727693,"download.sh problem, llama model 70B, results in several 0kb .pth files after download; two separate network locations for testing; reported by several users on different networks; MacOS Apple Silicon M2 Ventura 13.4.1 (c) (22F770820d)","After verifying that all libraries from the requirements.txt were installed in my python3 environment, in bash terminal I run -- however, upon completion downloading (and overall execution) I am finding that one or more consolidated.0x.pth files are zero kilobytes containing no data. I have tried to successfully download all .pth files on both WiFi & Ethernet from two separate networks. One at home, on my Verizon 5g ISP and the other on-campus at MIT. The same result occurs. I have verified disk storage space on both machines I attempted to acquire these files. It seems "" consolidated.05.pth "" fails most often; with the successfully acquired .pth files being 17.25 GB in size. However this morning I am seeing that consolidated.**05**.pth, consolidated.**04**.pth, and consolidated.**00**.pth have failed I am discouraged, as I have attempted to acquire these several times and requested a Meta access key twice. Are there any recommendations you can provide me with? Other resources, endpoints, or potential port that might resolve the problem in some way? or is this a **bug**? Thank you for your time!! ",2023-08-28T13:04:36Z,llama,https://github.com/meta-llama/llama/issues/732 731,1869390678,Can't download even the simplest Llama2 (7B) model,"Hi, I have already requested to download the model weights, and got approved, however, I am getting the following error: Can someone help me with this? Thanks, Reza",2023-08-28T09:38:02Z,llama,https://github.com/meta-llama/llama/issues/731 730,1869121209,Fine tuning on a collection of text documents," Is it right to expect llama to be fine tuned to knowledge in the form of unstructured text data from a proprietary text document (chunking text in fixed lengths with langchain text splitter and fine tuning)? I understand that there are embedding based similarity search to retrieve relevant responses, but would it be possible for llama to absorb the knowledge from a new document that it has never seen? My question is similar to this I kept the instruction column fixed with just a simple statement ('This is a useful information from ') and the output are the chunks at each row of the dataset. The expectation is when a question is asked from the PDF llama should answer it. With my experiment, llama is able to answer better than before fine tuning but still it is hallucinating a lot. Is my approach even valid? If it is valid what would be the right way to prepare data for fine tuning? ",2023-08-28T07:05:40Z,llama,https://github.com/meta-llama/llama/issues/730 729,1869013282,Enhance Codebase with function explanation,close #714 ,2023-08-28T05:45:18Z,llama,https://github.com/meta-llama/llama/pull/729 728,1868993634,add code to chat continuously with simple command.," same as, guess it's more appropriate here. ",2023-08-28T05:25:14Z,llama,https://github.com/meta-llama/llama/pull/728 727,1868991526,use 'md5' instead of 'md5sum' if Applie Silicon,fix the issue that checksum is not working properly in Silicon Mac.,2023-08-28T05:22:59Z,llama,https://github.com/meta-llama/llama/pull/727 725,1868493770,cannot generate complete sentences,"when I run example_chat_completion.py --ckpt_dir llama-2-13b-chat --tokenizer_path tokenizer.model --max_seq_len 512 chat is `{ ""role"": ""user"", ""content"": ""Design a script for an advertising video, the product is a new energy vehicle, and it is required to emphasize the advantages of tt he car's long battery life. It is required to design 3 screens and narration.""},` answer is `> Assistant: Sure, here's a script for an advertising video for a new energy vehicle, emphasizing the advantages of the car's long battery life. **Screen 1: Introduction** [Opening shot of a busy city street with a sleek, modern electric car driving through the scene] Narrator (in a cool, upbeat tone): ""Are you tired of running out of juice on your daily commute? Do you dream of a car that can keep up with your busy lifestyle without breaking a sweat?"" [Cut to a shot of the car's dashboard, showing the battery level and range] Narrator: ""Introducing the new XYZ Energy Vehicle, with a long-lasting battery that will take you further than ever before."" **Screen 2: Battery Life Advantages** [Cut to a shot of the car driving on the highway, with a graphic display of the battery level and range] Narrator: ""Our advanced battery technology allows for a range of up to 500 miles on a single charge, so you can drive all day without worrying about running out of power."" [Cut to a shot of a driver using the car's infotainment system to find charging stations] Narrator: ""And with our intuitive infotainment system, you can easily find charging stations along your route and get back on the road in no time."" **Screen 3: Real-Life Scenarios** [Cut to a shot of a driver dropping off kids at school, with a graphic display of the battery level and range] Narrator: ""Imagine being able to drop off the kids at school, go grocery shopping, and pick them up again without worrying about running out of juice."" [Cut to a shot of a driver on a road trip, with a graphic display of the battery level and range] Narrator: ""Or take a road trip without the ==================================` How can I get the rest of the answers? ",2023-08-27T13:18:40Z,llama,https://github.com/meta-llama/llama/issues/725 724,1868186077,HuggingFace request not yet approved,"Hi, I requested to access the model cards on HF from the same ID i requested on Meta Website and yet my HF request is not approved yet.",2023-08-26T17:37:33Z,llama,https://github.com/meta-llama/llama/issues/724 723,1868094281,Reproduce ToxiGen eval for llama2?,"I tried to reproduce the evaluation on ToxiGen Dataset, but failed (both Llama-2-7b-hf and Llama-2-13b-hf) shots: 6-shots dataset: generation params: top_p 0.9, temperature = 0.1, max_new_tokens 32 toxi cls model: (and count LABEL_1) result in paper: 21.25% but i got: 39.95% anything i can do? thanks",2023-08-26T13:16:27Z,llama,https://github.com/meta-llama/llama/issues/723 722,1868009917,RuntimeError: Distributed package doesn't have NCCL built in,"Using macos getting this error when executing : torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir tokenizer.model --max_seq_len 128 --max_batch_size 4 Error : NOTE: Redirects are currently not supported in Windows or MacOs. Traceback (most recent call last): File line 55, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 18, in main generator = Llama.build( File line 62, in build torch.distributed.init_process_group(""nccl"") File line 907, in init_process_group default_pg = _new_process_group_helper( File line 1013, in _new_process_group_helper raise RuntimeError(""Distributed package doesn't have NCCL "" ""built in"") RuntimeError: Distributed package doesn't have NCCL built in ERROR failed (exitcode: 1) local_rank: 0 (pid: 23024) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in _call_ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-08-26_15 15 host : 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa rank : 0 (local_rank: 0) exitcode : 1 (pid: 23024) error_file: traceback : To enable traceback see: ",2023-08-26T10:07:52Z,llama,https://github.com/meta-llama/llama/issues/722 721,1867990007,torch.distributed.elastic.multiprocessing.errors.ChildFailedError,"OS: ubunt 8 vCPU 32 GiB GPU:NVIDIA V100 root torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 4 > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Traceback (most recent call last): File line 55, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 18, in main generator = Llama.build( File line 83, in build checkpoint = torch.load(ckpt_path, map_location=""cpu"") File line 815, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File line 1033, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'. ERROR failed (exitcode: 1) local_rank: 0 (pid: 1727) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-08-26_16 22 host : iZwz98etw3xqaylir1y6pjZ rank : 0 (local_rank: 0) exitcode : 1 (pid: 1727) error_file: traceback : To enable traceback see: ============================================================ ",2023-08-26T09:02:50Z,llama,https://github.com/meta-llama/llama/issues/721 720,1867982104,Error 10049,"Can someone please help me understand what I'm doing wrong here? (llama_env) --nproc_per_node 1 example_completion.py NOTE: Redirects are currently not supported in Windows or MacOs. [W C 601] [c10d] The client socket has failed to connect to [Adam]:29500 (system error: 10049 - The requested address is not valid in its context.). [W C 601] [c10d] The client socket has failed to connect to [Adam]:29500 (system error: 10049 - The requested address is not valid in its context.). D can't open file 'example_completion.py': [Errno 2] No such file or directory ERROR failed (exitcode: 2) local_rank: 0 (pid: 39992) of binary: Traceback (most recent call last): File line 10, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-08-26_03 11 host : Adam rank : 0 (local_rank: 0) exitcode : 2 (pid: 39992) error_file: traceback : To enable traceback see: ============================================================",2023-08-26T08:44:34Z,llama,https://github.com/meta-llama/llama/issues/720 719,1867864243,a bug to inference Llama-2-70b using 16 gpus in one machine,"GPU:16 * A10 ( 16 * 23 G) in one machaine: But it apears an error when I use GPUs = 16: I ask many people to solve this problem,but failed. I know 8 gpu can work it! But I need to increase the prompt of llama2, the 8 GPU is not enough! Do you have some ideas, thanks!",2023-08-26T02:30:12Z,llama,https://github.com/meta-llama/llama/issues/719 717,1866912308,Is there a way to prevent llama2 from truncating the answer at the middle of sentence when it reaches the maximum token length?,"Hello all, I'm using llama2 7b chat huggingface model and I want to restrict the output token size to a specific value such as 512. While initializing the model I am setting max_new_tokens parameter as 512 as below: llama_llm = transformers.pipeline( model=model, tokenizer=tokenizer, return_full_text=True, task='text-generation', # we pass model parameters here too temperature=0.0, **max_new_tokens=512,** repetition_penalty=1.1 ) But this time it cuts off the answer at the middle of the sentence. I want the model to produce a response that fits the length I specified. Maybe it can be done with prompt engineering. I also tried but couldn't be succesfull. Anyone knows how to solve this? Thank you.",2023-08-25T11:56:12Z,llama,https://github.com/meta-llama/llama/issues/717 716,1866386610,The way to download the model weight is wrong,"When I want to download the model weight, I login the "" However I can not to download the model, where shows ""Sorry, the download is not available in your region."" What is the wrong",2023-08-25T06:16:32Z,llama,https://github.com/meta-llama/llama/issues/716 715,1866204717,Unable to establish SSL connection.,"Resolving download.llamameta.net... 13.33.88.72, 13.33.88.62, 13.33.88.113, ... Connecting to download.llamameta.net|13.33.88.72|:443... connected. OpenSSL: error SSL routines reason(1000) Unable to establish SSL connection. Checking checksums md5sum: checklist.chk: no properly formatted MD5 checksum lines found ",2023-08-25T03:21:53Z,llama,https://github.com/meta-llama/llama/issues/715 713,1865365167,Discussion: investigate requirements for unlimited context length,"Curious where to begin research for unlimited context length? Any direction appreciated. ",2023-08-24T15:06:16Z,llama,https://github.com/meta-llama/llama/issues/713 712,1864797194,Updating the batch size for the chat completion example,while the batch size is at 4 we there is an error ( ERROR failed ) while at 6 it works. ,2023-08-24T09:48:12Z,llama,https://github.com/meta-llama/llama/pull/712 711,1864505959,interfere the llama2-70b-chat,"command:CUDA_VISIBLE_DEVICES=""0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15"" python generate.py --prompt_type=llama2 --use_gpu_id=False --share=True It appears a BUG when I use GPUs > 10: 10 gpu is ok!But more gpu is helpful! When I use GPU <= 10, it can work! Like this command:CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8,9 python generate.py --prompt_type=llama2 --use_gpu_id=False --share=True But I need more gpu because longer prompt need more gpu memmory.Thanks!",2023-08-24T06:40:32Z,llama,https://github.com/meta-llama/llama/issues/711 710,1864000123,Anyway to load a checkpoint with a different world size ?,"When loading a model that was trained with N model parallels (MP > 1), is there anyway to only use 1 GPU ? 1. Can a single GPU be split to provide 2 MPs ? 2. Can a model be trained with >1 device, but then only require 1 device to run that model for inference ? 3. Can I use a GPU as device and my CPU for another device ? So 1 MP = GPU, and 1 MP = CPU ? I am following the llama instructions for using the llama 2 models, and I have a 4090 GPU, and a 7950 CPU with 96 GB RAM, but when I run the 13b model with MP=1 it says ""Loading a checkpoint for MP=2 but world size is 1"", and when I pass in MP=2 it says "" Loading a checkpoint for MP=2 but world size is 1"". 7B works fine with 1MP, but is there anyway around this, so I can use 13B ? Or do I need another 4090 GPU (I presume they have to be the same model etc.) ? Thanks.",2023-08-23T20:56:01Z,llama,https://github.com/meta-llama/llama/issues/710 709,1863328711,generator = llama.build gets stuck when using a web wrapper such as flask or Django,"When using llama-2-13b-chat it works fine when setting up the example chat.py file . Once I try incorporating a web wrapper, the build gets stuck? Any help appreciated!",2023-08-23T13:21:42Z,llama,https://github.com/meta-llama/llama/issues/709 708,1862875539,name 'execfile' is not defined,I am unable to run pip install llama on python3.10,2023-08-23T08:54:04Z,llama,https://github.com/meta-llama/llama/issues/708 707,1862491998,Adding .gitattributes to ensure .sh files are checked out with LF,"Without this, bash scripts cannot run on Windows + WSL environments without manually fixing the line endings. Python and other tools are usually smart enough to deal with it, but not bash :( The issue usually surfaces as described in #493 ",2023-08-23T03:15:27Z,llama,https://github.com/meta-llama/llama/pull/707 706,1862436332,ERROR:torch.distributed.elastic.multiprocessing.api:failed,"ERROR failed (exitcode: 1) local_rank: 0 (pid: 2995886) of binary: CUDA_VISIBLE_DEVICES=""5,6,7"" torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 4 > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Loaded in 22.04 seconds Traceback (most recent call last): File ""example_chat_completion.py"", line 89, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""example_chat_completion.py"", line 72, in main results = generator.chat_completion( File line 268, in chat_completion generation_tokens, generation_logprobs = self.generate( File line 27, in decorate_context return func(*args, **kwargs) File line 117, in generate assert bsz <= params.max_batch_size, (bsz, params.max_batch_size) AssertionError: (6, 4) ERROR failed (exitcode: 1) local_rank: 0 (pid: 2995886) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 345, in wrapper return f(*args, **kwargs) File line 724, in main run(args) File line 715, in run elastic_launch( File line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_chat_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-08-23_10 21 host : dl rank : 0 (local_rank: 0) exitcode : 1 (pid: 2995886) error_file: traceback : To enable traceback see: ============================================================ ",2023-08-23T02:11:19Z,llama,https://github.com/meta-llama/llama/issues/706 705,1862326797,Llama2-13b-chat: Output contains part of prompt (including [/INST] tag),"If I copy the agent response back to the prompt (as a user input), it regurgitates the previous output including the tag, which is very strange. **Prompt** **Generated Response verbatim** ",2023-08-22T23:25:45Z,llama,https://github.com/meta-llama/llama/issues/705 704,1861976384, llama-2-13b-chat model not working for chat bot,"Hi, I have a problem. I test following the guide like this: I change the example_chat_completion.py code to make a chat bot simply, the code changed works in llama-2-7b-chat model but not work in llama-2-13b-chat. When I input the content, it cant output the content and just be stuck. It's strange that same code works in llama-2-7b-chat model but dont work in llama-2-13b-chat model. This is my changed code: Then I run by command for llama-2-7b-chat model and this one works. for llama-2-13b-chat model and this one doesn't work. I noticed model have separate file like consolidated.00.pth and 01. I am sure my downloaded model is ok because I can run successfully following the chat completion example. But the changed code only work in 7b model and not work in 13b model. Anyone can help me? Thanks in advance! ",2023-08-22T18:23:51Z,llama,https://github.com/meta-llama/llama/issues/704 703,1861405331,fix max_batch_size for chat example,"default prompts size is 6 in example_chat_completion.py, but there is 4 in readme. ",2023-08-22T12:54:08Z,llama,https://github.com/meta-llama/llama/pull/703 702,1860547761,Fine-tuning: llama-2-13b-chat,"For fine-tuning of the large language models (llama2), what should be the and structure (like should be an excel or docs file or prompt and response or instruction and output) of the training dataset? And also how to prepare or organise the tabular dataset for training purpose?",2023-08-22T04:55:09Z,llama,https://github.com/meta-llama/llama/issues/702 701,1859218693,Incorrect inference after PEFT (QLoRA),"Facing an issue while tuning LLAMA-2-7b-chat on which I request some suggestions. 1. I use a specific system prompt that defines some keys, and then provide an instruction and ask the model to generate a JSON output with these keys. I am using 7b-chat model. Even with 5 examples, the output is fine. 2. When I take 1000 such examples and use PEFT-QLoRA to tune it (each sample consists of system prompt, instruction and output in LLAMA-2 prompt structure), I do not get proper results. What could be the issue here? 1. Is it correct to use System Prompt, Instruction and Output in LLAMA-2 prompt structure ( )? Or should I be using something else? 3. For this exercise, should 7b-chat be used or 7b? 4. Could quantization be leading to a issue here? Why would I not get the expected output even with tuning the model with 1000 examples? Thanks in advance.",2023-08-21T11:57:45Z,llama,https://github.com/meta-llama/llama/issues/701 700,1859201619,the result of llama-2 on benchmarks,"i am trying to reproduce the result of llama-2 on benchmarks. Due to device limitations, I tried llama-2-7b on boolq dataset.However,the result is far away from paper. Under zero-shot setting,I just get em score of 36.81 while it is reported 77.4 in paper. Is it because of the prompt? Is there anyone who has reproduced the success and can give me some help? Thanks a lot !",2023-08-21T11:46:48Z,llama,https://github.com/meta-llama/llama/issues/700 699,1858977833,Cant use Windows as fire packacge requires NCCL,"Hi, How can I use this system on Windows when it can only be run with NCCL ? The instructions require a lot of changing for this - the example script can not be without switching the backend to goo from NCCL RuntimeError: Distributed package doesn't have NCCL built in ERROR failed (exitcode: 1) local_rank: 0 (pid: 23152) of binary: Then you can't use the torchrun.exe as the example shows, as this due to unable to **start a process**, you have to use python -m torchrun-script As the above python script will actually run. Has anyone tried these instructions with Windows 10 Pro , using CUDA, and a NVidia GPU ? ",2023-08-21T09:30:25Z,llama,https://github.com/meta-llama/llama/issues/699 698,1858667836,down load wrong ,"Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 7B download.sh: 12: [[: not found Downloading LICENSE and Acceptable Usage Policy download.sh: 17: Bad substitution ",2023-08-21T06:10:58Z,llama,https://github.com/meta-llama/llama/issues/698 696,1858174363,pipeline vs model.generate,"I tried the text generation on llama 13B with following two implementations (both in huggingface format): **model.generate** **pipeline** I found that **model.generate** is faster but consume more GPU memory. E.g., for a certain input instance (batch size = 1), the comparison of inference time is like: | **model** | **pipeline** | | --- | ----------- | | 4 s | 10 s | On a single 80G GPU, I can set following max batch size without OOM: **model**: 2 **pipeline**: 4 Any idea and suggestion?",2023-08-20T16:06:33Z,llama,https://github.com/meta-llama/llama/issues/696 695,1858055322,make download.sh executable, ,2023-08-20T09:24:07Z,llama,https://github.com/meta-llama/llama/pull/695 694,1858013516,failed to create process,"Hi, I am using Windows 10, with a conda env, and have CUDA and everything else setup and working as per the instructions. When I try to run my first example I get a failed to start process ` (llama2env) --nproc_per_node 1 example_text_completion.py failed to create process. ` I can't even get to type in the second line of the script, or if I put the whole thing on one line in my cmd.exe torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 I still get the same error **failed to create process.** I can run the tests to get back a tensor ` import torch x = torch.rand(5, 3) print(x) tensor([[0.5495, 0.0281, 0.2566], [0.7032, 0.1296, 0.8173], [0.0329, 0.5500, 0.3025], [0.6790, 0.0561, 0.3389], [0.4403, 0.5365, 0.5513]]) ` And also - ensure CUDA and my 4090 GPU is available >>> torch.cuda.is_available() True The output from **conda info** ` active environment : llama2env active env location : shell level : 1 user config file : populated config files : conda version : 22.9.0 conda-build version : not installed python version : 3.8.17.final.0 virtual packages : __cuda=12.2=0 __win=0=0 __archspec=1=x86_64 base environment : (writable) conda av data dir : conda av metadata url : None channel URLs : package cache : envs directories : platform : win-64 user-agent : administrator : False netrc file : None offline mode : False ` Python 3.11.4 version installed in this conda environment (llama2env) Python 3.11.4 | packaged by Anaconda, Inc. | (main, Jul 5 2023, 13 18) [MSC v.1916 64 bit (AMD64)] on win32 Type ""help"", ""copyright"", ""credits"" or ""license"" for more information. And the list of the Python 3.11.4 packages installed in this conda env `(llama2env) list # packages in environment at U # # Name Version Build Channel blas 1.0 mkl brotlipy 0.7.0 py311h2bbff1b_1002 bzip2 1.0.8 he774522_0 ca-certificates 2023.05.30 haa95532_0 certifi 2023.7.22 py311haa95532_0 cffi 1.15.1 py311h2bbff1b_3 charset-normalizer 2.0.4 pyhd3eb1b0_0 cryptography 41.0.2 py311hac1b9e3_0 cuda-cccl 12.2.128 0 nvidia cuda-cudart 11.8.89 0 nvidia cuda-cudart-dev 11.8.89 0 nvidia cuda-cupti 11.8.87 0 nvidia cuda-libraries 11.8.0 0 nvidia cuda-libraries-dev 11.8.0 0 nvidia cuda-nvrtc 11.8.89 0 nvidia cuda-nvrtc-dev 11.8.89 0 nvidia cuda-nvtx 11.8.86 0 nvidia cuda-profiler-api 12.2.128 0 nvidia cuda-runtime 11.8.0 0 nvidia fairscale 0.4.13 pypi_0 pypi filelock 3.9.0 py311haa95532_0 fire 0.5.0 pypi_0 pypi freetype 2.12.1 ha860e81_0 giflib 5.2.1 h8cc25b3_3 idna 3.4 py311haa95532_0 intel-openmp 2023.1.0 h59b6b97_46319 jinja2 3.1.2 py311haa95532_0 jpeg 9e h2bbff1b_1 lerc 3.0 hd77b12b_0 libcublas 11.11.3.6 0 nvidia libcublas-dev 11.11.3.6 0 nvidia libcufft 10.9.0.58 0 nvidia libcufft-dev 10.9.0.58 0 nvidia libcurand 10.3.3.129 0 nvidia libcurand-dev 10.3.3.129 0 nvidia libcusolver 11.4.1.48 0 nvidia libcusolver-dev 11.4.1.48 0 nvidia libcusparse 11.7.5.86 0 nvidia libcusparse-dev 11.7.5.86 0 nvidia libdeflate 1.17 h2bbff1b_0 libffi 3.4.4 hd77b12b_0 libnpp 11.8.0.86 0 nvidia libnpp-dev 11.8.0.86 0 nvidia libnvjpeg 11.9.0.86 0 nvidia libnvjpeg-dev 11.9.0.86 0 nvidia libpng 1.6.39 h8cc25b3_0 libtiff 4.5.0 h6c2663c_2 libuv 1.44.2 h2bbff1b_0 libwebp 1.2.4 hbc33d0d_1 libwebp-base 1.2.4 h2bbff1b_1 llama 0.0.1 dev_0 lz4-c 1.9.4 h2bbff1b_0 markupsafe 2.1.1 py311h2bbff1b_0 mkl 2023.1.0 h6b88ed4_46357 mkl-service 2.4.0 py311h2bbff1b_1 mkl_fft 1.3.6 py311hf62ec03_1 mkl_random 1.2.2 py311hf62ec03_1 mpmath 1.3.0 py311haa95532_0 networkx 3.1 py311haa95532_0 numpy 1.25.2 py311hdab7c0b_0 numpy-base 1.25.2 py311hd01c5d8_0 openssl 3.0.10 h2bbff1b_0 pillow 9.4.0 py311hd77b12b_0 pip 23.2.1 py311haa95532_0 pycparser 2.21 pyhd3eb1b0_0 pyopenssl 23.2.0 py311haa95532_0 pysocks 1.7.1 py311haa95532_0 python 3.11.4 he1021f5_0 pytorch 2.0.1 py3.11_cuda11.8_cudnn8_0 pytorch pytorch-cuda 11.8 h24eeafa_5 pytorch pytorch-mutex 1.0 cuda pytorch requests 2.31.0 py311haa95532_0 sentencepiece 0.1.99 pypi_0 pypi setuptools 68.0.0 py311haa95532_0 six 1.16.0 pypi_0 pypi sqlite 3.41.2 h2bbff1b_0 sympy 1.11.1 py311haa95532_0 tbb 2021.8.0 h59b6b97_0 termcolor 2.3.0 pypi_0 pypi tk 8.6.12 h2bbff1b_0 torchaudio 2.0.2 pypi_0 pypi torchvision 0.15.2 pypi_0 pypi typing_extensions 4.7.1 py311haa95532_0 tzdata 2023c h04d1e81_0 urllib3 1.26.16 py311haa95532_0 vc 14.2 h21ff451_1 vs2015_runtime 14.27.29016 h5e58377_2 wheel 0.38.4 py311haa95532_0 win_inet_pton 1.1.0 py311haa95532_0 xz 5.4.2 h8cc25b3_0 zlib 1.2.13 h8cc25b3_0 zstd 1.5.5 hd43e919_0` ",2023-08-20T07:34:18Z,llama,https://github.com/meta-llama/llama/issues/694 693,1857881011,Llama v1, ,2023-08-19T21:42:37Z,llama,https://github.com/meta-llama/llama/pull/693 692,1857663845,"Why are positional encodings only applied to Queries and Keys, but not Values?","As per title, why are positional encodings only applied to the query and the keys, but not to the values? I have given an interpretation of my own in this comment, which I quote, but this is a personal hypothesis, not the result of a research project. > As you know from the Transformer theory, adding positional encodings introduces spatial information in the model. The attention formula is . As you can see, the output of the is a tensor with shape , so you can think of it as a tensor of ""scores"" (in fact the output of the is used to visualize the attention) that represents the ""intensity"" of how much two tokens are related to each other and thus ""amplifies"" some tokens in the V tensor and does not amplify others based on the scores produced by the . Since the output of the is a matrix that represents the ""intensity"" of how much two tokens are related to each other, that's also the reason we apply the mask to it to disable some interactions when we want to make the model causal. So, the sole goal of the ""query"" and the ""keys"" is to decide which value to amplify in the V tensor to produce the output attention. This is to say that the positional encoding added only in the query and the keys are enough in deciding which value to ""amplify"" in the output of the attention. The value tensor is ""passive"" in this process, so that's why the model performance does not degrade if you don't encode positional encodings in the V matrix. If this is not clear, I suggest you watch the following video from the minute 35:39 onward ( You may argue that the V matrix does not contain any positional information. so how can the FFN understand it? My hypothesis is that the information captured by the ""scores"" of the attention is enough to be conveyed to the output of the attention through its multiplication with V. Is there a study on why this works? I am referring to the following [piece of code](https In the vanilla self-attention, positional encodings (which were absolute) were applied to all the 3 matrices, Q, K and V. ",2023-08-19T10:20:54Z,llama,https://github.com/meta-llama/llama/issues/692 691,1857662258,JSONDecodeError: generation_config.json on Llama-2-13b-hf model repo," on should be . ",2023-08-19T10:14:26Z,llama,https://github.com/meta-llama/llama/issues/691 690,1857614117,Delay in Receiving Llama-2 Model Download,"I am writing to report an issue regarding the delayed delivery of the Llama-2 model download email from Meta's open-source resources. I have followed the necessary steps by filling out the form on the Meta AI Resources Page- to access the Llama-2 model. However, despite the typical waiting time of 2 hours to 2 days, it has been over 10 days now, and I have not received the model download email. I have attempted to rectify this situation by using a different email address as well, but unfortunately, I have not received any response or download link on that email either.",2023-08-19T07:14:41Z,llama,https://github.com/meta-llama/llama/issues/690 689,1856766882,"Is it possible to use llama for sentimental analysis classification tasks? If so, how should we adjust it? ", ,2023-08-18T13:53:47Z,llama,https://github.com/meta-llama/llama/issues/689 688,1856660836,Max output token length 2048? ,"Hi, I've been looking for documentation that describes the max output token length for Llama. I've been running Vicuna 13b, and and running into the token length issue as I try to summarize information from many documents. Do you know where the max token limit is documented? Also, is there any way around it if I needed to say generate a 3 page table that listed some entities extracted from text and the sentences they are contained within?",2023-08-18T12:42:11Z,llama,https://github.com/meta-llama/llama/issues/688 687,1856587645,Llama-2-7b-chat-hf will not allow temperature to be 0.0,"I am running the Llama-2-7b-chat-hf model on Huggingface. When I set temperature=0.0 or temperature=0, I get temperature Until a week ago, It was working with the same code and environment. My code and error message; ",2023-08-18T11:48:10Z,llama,https://github.com/meta-llama/llama/issues/687 686,1856112026,Fine-tuning the model like LLama2-70B,thanks!,"Hello!There are few tutorials on fine-tuning this large model LLama2-70B. What instruction should I use to fine tune it(like Lora)? **GPU**:16 * A10(16 * 24G) **Data**:10,000+ pieces of data,like:{**""instruction""**: ""Summarize this Ethereum transaction."",**""input""**: "" '2023-08-15 04 47', 'from_address': '0xc16a101973403a71a1d42fdc12fed3c5f45e5bfe', 'from_address_label': 'topensdoc.eth', 'to_address': '0x0a791089acf48912a9cfde00e3a6afe9edbc3221', ......."",**""output""**: ""summary text"",} Can you give some advice, thank you very much?If you have some idea,, what is the recommended setting for the epoch parameter for data of this order of magnitude? If fine-tuning is successful, I will also disclose my steps in the community.Thanks!",2023-08-18T06:07:02Z,llama,https://github.com/meta-llama/llama/issues/686 685,1856032266,"run 70b error is RuntimeError: shape '[1, 28, 64, 128]' is invalid for input of size 28672 ","13b can run, but 70b can't work My hardware device is eight gtx-4090 `python Traceback (most recent call last): File line 15, in sequences = pipeline( File line 201, in __call__ return super().__call__(text_inputs, **kwargs) File line 1120, in __call__ return self.run_single(inputs, preprocess_params, forward_params, postprocess_params) File line 1127, in run_single model_outputs = self.forward(model_inputs, **forward_params) File line 1026, in forward model_outputs = self._forward(model_inputs, **forward_params) File line 263, in _forward generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs) File line 115, in decorate_context return func(*args, **kwargs) File line 1572, in generate return self.sample( File line 2619, in sample outputs = self( File line 1501, in _call_impl return forward_call(*args, **kwargs) File line 165, in new_forward output = old_forward(*args, **kwargs) File line 688, in forward outputs = self.model( File line 1501, in _call_impl return forward_call(*args, **kwargs) File line 578, in forward layer_outputs = decoder_layer( File line 1501, in _call_impl return forward_call(*args, **kwargs) File line 165, in new_forward output = old_forward(*args, **kwargs) File line 292, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File line 1501, in _call_impl return forward_call(*args, **kwargs) File line 165, in new_forward output = old_forward(*args, **kwargs) File line 195, in forward key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) RuntimeError: shape '[1, 28, 64, 128]' is invalid for input of size 28672 ` File line 195, in forward key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) RuntimeError: shape '[1, 28, 64, 128]' is invalid for input of size 28672",2023-08-18T04:24:46Z,llama,https://github.com/meta-llama/llama/issues/685 684,1855960994,error: is not divisible by when 70b-chat inference,"When run 70b-chat on 7 GPUs, errors occurred: AssertionError:8192 is not divisible by 7. What does 8192 mean? What should I do? environment: Ubuntu 20.04.4 pytorch 2.0.1 CUDA 11.6 GPU: NVIDIA A30 24GB*7 config: torchrun --nproc_per_node 7 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 4 addition: 13b-chat on 2 GPUs has been successful. If I set nproc_per_node=6 or 8, following errors always occurred: Loading a checkpoint for MP=7 but world size is 6 (or 8).",2023-08-18T02:49:50Z,llama,https://github.com/meta-llama/llama/issues/684 682,1855755422,upload script can not be restarted properly when a wget failed in the middle - get 403 forbidden errors,"My ISP seems to not like lots of large 16GB downloads, so some transfers are terminated after a few gigabytes. I found that when I try to restart (with wget -c), I get 403 errors. My hypothesis is that the llama download server thinks the aborted transfer finished OK and marked the key unusable (for this file). I've been asking for new keys to download the straggler files, but it's highly inconvenient. Am I right with my suspicion ? Is there something that can be done about it ? If this report is not clear enough, I'm happy to give more detail.",2023-08-17T21:52:28Z,llama,https://github.com/meta-llama/llama/issues/682 681,1855424339,Llama, ,2023-08-17T17:33:52Z,llama,https://github.com/meta-llama/llama/issues/681 680,1854932697,error when adding tokens to Llama2,"Before I added words to the vocabulary, everything was fine. However, once I added new words, many words turned into ""unk"", with index 0. Here is an example: ",2023-08-17T12:44:54Z,llama,https://github.com/meta-llama/llama/issues/680 678,1854466329,Allows download.sh for resuming broken downloads,"When downloading models, there is a possibility of interruptions or failures due to network disconnections or insufficient disk space. To address this, I use command, which allows for resuming broken downloads. This would greatly enhance the user experience when downloading models.",2023-08-17T07:53:13Z,llama,https://github.com/meta-llama/llama/pull/678 677,1854173590,pip install -e . failed,"I tried to install fairscale via proxy with the following code. It returns error logs: I've also tried and but neither worked. It seems that the package had been downloaded but couldn't be installed. Thus, it may not be the problem of proxy. Below is my environment. ",2023-08-17T03:16:41Z,llama,https://github.com/meta-llama/llama/issues/677 675,1853405163,Llama-2 prompts for single word answer,"I am working on a use case using Llama-2 wherein the model is given a prompt (contains context and question) and the model should give answer for the provided question using the context only. I want the model to respond only with the required answer and no additional text. For eg: Current output - ""Based on the context given the name of the actor is Daniel"" Expected output - ""Daniel"" Can someone provide me some prompt examples that can help me achieve output in the above required format.",2023-08-16T14:50:23Z,llama,https://github.com/meta-llama/llama/issues/675 674,1851300650,Confusion about the implementation for generating the first token,"Hey guys, I am confused about the following code at generation.py, starts at line 136: ` for cur_pos in range(min_prompt_len, total_len): logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos) if logprobs: token_logprobs[:, prev_pos + 1 : cur_pos + 1] = -F.cross_entropy( input=logits.transpose(1, 2), target=tokens[:, prev_pos + 1 : cur_pos + 1], reduction=""none"", ignore_index=pad_id, )` Here we see the **cur_pos** starts at the **min_prompt_len**, It actually means that the prompt with the **min_prompt_len** will be processed differently (i.e., the attention mechnism) from other prompt at the first iteration. Specifically, it takes all the promts to compute the first token for the prompt with the **min_prompt_len**, while for other prompts it only takes the previous one token for computing the frist token. I wonder if it is ok to do so or just a trick (I also wonder if other large language model use this trick, too)",2023-08-15T11:45:17Z,llama,https://github.com/meta-llama/llama/issues/674 673,1851139514,Running Llama2 models locally on Windows10,"I got approval from meta, then I downloaded all meta Llama2 models locally(I followed all steps and everything was fine). I tried to run the model 7B using this command “torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4” (as mentioned on Llama2 GitHub), but when I run this command I got this error “Distributed package doesn’t have NCCL built in”, So I tried to download NCCL 2.16.5 that could support cuda 11.8, but still not working. ** My environment : - Windows10, Nvidia Geforce RTX 3090 - CUDA11.8 - torch 2.0.1+cu118 - When I run ""torch.__version __"" I got 2.0.1+cu118 - When I run “torch.cuda.is_available()” I got True - When I run “nvcc --version” I got Build cuda_11.8.r11.8 These screenshots may help to understand more the problem : I want to find a way to run the script. I’m blocked for 3 days, please help if there a solution. Thanks in advance. ",2023-08-15T09:19:47Z,llama,https://github.com/meta-llama/llama/issues/673 672,1850901201,Update download.sh, ,2023-08-15T04:49:52Z,llama,https://github.com/meta-llama/llama/pull/672 671,1850212608,"Making llama text generation, deterministic","This is a copy of an issue from hugging face hub since no response was received over there Following the text generation code template there, I’ve been trying to generate some outputs from llama2 but running into stochastic generations. For instance, running the same prompt through the model.generate() twice results in two different outputs as shown in the example below. I’ve used model.generate() with other LLMs (e.g., flant5) with the other parameters remaining the same and have obtained deterministic outputs. Also tried AutoModelForCausalLM instead of LLamaForCausalLM but still got different outputs each time for the same prompt. How do I make sure I get the same text generated each time? Code to reproduce: ",2023-08-14T17:21:11Z,llama,https://github.com/meta-llama/llama/issues/671 670,1849360825,Counting tokens for Chat models,"Does anyone how to calculate prompt and completion tokens for Llama Chat models for monitoring purposes? Can we add this in responses as many times we don't have libraries to achieve this in languages like java, kotlin, etc. Similar to tiktoken by openai - ",2023-08-14T09:12:28Z,llama,https://github.com/meta-llama/llama/issues/670 669,1849332115,"Is it possible to train the ""llamav2 7b"" model on a custom dataset using Google Colab Pro, which offers approximately 32 GB of system RAM and 16 GB of GPU RAM?",Llamav2 7B required 24GB System RAM or GPU RAM ?,2023-08-14T08:57:17Z,llama,https://github.com/meta-llama/llama/issues/669 668,1849031039,Request body format for chat models?,"Not sure if this is answered somewhere, what is the proper request body format for chat models like Llama 2 13B Chat. The following body works for me, but I'm not sure why inputs is a list of lists. Also keen to know what all parameters are available. `{ ""inputs"": [ [ { ""role"": ""system"", ""content"": ""You are a helpful assistant"" }, { ""role"": ""user"", ""content"": ""hi"" } ] ], ""parameters"": { ""max_new_tokens"": 10 } }` ",2023-08-14T05:31:46Z,llama,https://github.com/meta-llama/llama/issues/668 666,1847610378,llama2 for image captioning,"hi there, is there any example (if possible) to use Llama2 for image captioning ? thank you ",2023-08-12T00:54:49Z,llama,https://github.com/meta-llama/llama/issues/666 665,1847277542,Running LLaMA-2-7B on 8x K80 GPUs,"I am trying to run the model on an AWS EC2 p2.8xlarge instance with 8x Nvidia Tesla K80 GPUs, each with 12 GB VRAM (for a total of 96 GB). I set , but then I get this error message: AssertionError: Loading a checkpoint for MP=1 but world size is 8",2023-08-11T18:46:31Z,llama,https://github.com/meta-llama/llama/issues/665 663,1846423460,working with huggingface Llama 2 13b chat hf model_kwargs value error,"using Llama 2 13b chat hf model ( with 4bit quantization (bitsandbytes) getting an error in the following code.. it used to work earlier generate_text = transformers.pipeline( model=model, tokenizer=tokenizer, return_full_text=True, # langchain expects the full text task='text-generation', temperature=0.0, max_new_tokens=2000, repetition_penalty=1.1 ) ValueError: The following 'model_kwargs' are not used by the model: ['max_new_token'','repetition_policy'] (note:Typos in the generate arguments will also show up in this list) whole code ------- from torch import cuda, bfloat16 import transformers model_id = device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu' print(f""Device avialble is on {device}"") bnb_config = transformers.BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=bfloat16 ) hf_auth = '####' model_config = transformers.AutoConfig.from_pretrained( model_id, use_auth_token=hf_auth ) tokenizer = transformers.AutoTokenizer.from_pretrained( model_id, use_auth_token=hf_auth ) model = transformers.AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, config=model_config, quantization_config=bnb_config, device_map='auto', use_auth_token=hf_auth )",2023-08-11T08:32:12Z,llama,https://github.com/meta-llama/llama/issues/663 662,1846277704,Error when running example_chat_completion.py with torchrun,"Why i have this error when i try to run llama-7b on windows (CPU: i5-7300HQ , memory:24576MB RAM): >torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir llama-2-7b --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 4 NOTE: Redirects are currently not supported in Windows or MacOs. [W [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - LÆadresse demandÚe nÆest pas valide dans son contexte.). [W [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - LÆadresse demandÚe nÆest pas valide dans son contexte.). [W [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - LÆadresse demandÚe nÆest pas valide dans son contexte.). [W [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - LÆadresse demandÚe nÆest pas valide dans son contexte.). > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Traceback (most recent call last): File line 73, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File line 20, in main generator = Llama.build( ^^^^^^^^^^^^ File line 114, in build model = Transformer(model_args) ^^^^^^^^^^^^^^^^^^^^^^^ File line 269, in __init__ self.layers.append(TransformerBlock(layer_id, params)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 232, in __init__ self.feed_forward = FeedForward( ^^^^^^^^^^^^ File line 211, in __init__ self.w1 = ColumnParallelLinear( ^^^^^^^^^^^^^^^^^^^^^ File line 262, in __init__ self.weight = Parameter(torch.Tensor(self.output_size_per_partition, self.in_features)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: [enforce fail at data. DefaultCPUAllocator: not enough memory: you tried to allocate 90177536 bytes. ERROR failed (exitcode: 1) local_rank: 0 (pid: 10568) of binary: Traceback (most recent call last): File """", line 198, in _run_module_as_main File """", line 88, in _run_code File line 7, in File line 346, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_chat_completion.py FAILED ------------------------------------------------------------",2023-08-11T06:39:52Z,llama,https://github.com/meta-llama/llama/issues/662 661,1845932694,[TESTING] 2d sharding,[TESTING] 2d sharding,2023-08-10T21:38:20Z,llama,https://github.com/meta-llama/llama/pull/661 660,1845742267,What is the prompt format when use Llama-2-70b-chat-hf? ,What is the prompt format when using Llama-2-70b-chat-hf? The symbols like <> is not supported by the hugging face tokenizer. It seems we can't use the format given py example_chat_completion.py.,2023-08-10T19:08:33Z,llama,https://github.com/meta-llama/llama/issues/660 659,1845520996,example_text_completion.py doesn't work with Python 3.10,"I try install llama package and I have the following error: `pip install llama Defaulting to user installation because normal site-packages is not writeable Collecting llama Using cached llama-0.1.1.tar.gz (387 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [6 lines of output] Traceback (most recent call last): File """", line 2, in File """", line 34, in File line 6, in NameError: name 'execfile' is not defined [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details. ` I try to convert with 2to3 and I have the following error: `pip install llama-0.1.1-1.tar.gz Processing Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [7 lines of output] Traceback (most recent call last): File """", line 2, in File """", line 34, in File line 7, in globals()) File """", line 1, in ImportError: attempted relative import with no known parent package [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. ` Suggestions please.",2023-08-10T16:30:28Z,llama,https://github.com/meta-llama/llama/issues/659 658,1844823197,Confusion about the default max_seq_len = 2048,"When reading the class Transformer, I found that the code use max_seq_len * 2 to prepare the rotary positional encoding, which confused me for a while. Then I realized that the default max_seq_len was set to 2048, and the 'max_seq_len * 2' aims to generate 4096 positional embeddings, corresponding to the 4K context length in the paper. I understand it can achieve the purpose but why not setting max_seq_len directly to 4096? which is more clear and less likely to cause misconception. self.freqs_cis = precompute_freqs_cis( self.params.dim self.params.n_heads, self.params.max_seq_len * 2 )",2023-08-10T09:44:49Z,llama,https://github.com/meta-llama/llama/issues/658 656,1844763158,what is the usage of safetensors? ,Safetensors seems not very useful when using model weights to generate texts. Are they reward model or any other parts of the Llama2 pipelines?,2023-08-10T09:18:42Z,llama,https://github.com/meta-llama/llama/issues/656 655,1844662387,wget: unable to resolve host address ‘download.llamameta.net’,"error log: I modify download.sh to download files with proxy as shown below: How can I download llama 2 with proxy?",2023-08-10T08:15:41Z,llama,https://github.com/meta-llama/llama/issues/655 654,1844213563,"AssertionError: (6, 4) with example_chat_completion.py","Hello! I'm getting an error when running the script. Any help is much appreciated. Thank you! Error: ",2023-08-10T00:20:53Z,llama,https://github.com/meta-llama/llama/issues/654 653,1843621668,"Connecting to download.llamameta.net ... connected -> HTTP request sent, awaiting response... 403 Forbidden","when I had md5sum, i got a message that Checking checksums md5sum: checklist.chk: no properly formatted checksum lines found now I installed md5sha1sum (i had to unlink coreutils). Now it keeps running and never gives me an output after the message: ""Checking checksums"" ",2023-08-09T16:24:13Z,llama,https://github.com/meta-llama/llama/issues/653 652,1843620280,"Connecting to download.llamameta.net (download.llamameta.net)|65.8.20.65|:443... connected.HTTP request sent, awaiting response... 403 Forbidden","when I had md5sum, i got a message that Checking checksums md5sum: checklist.chk: no properly formatted checksum lines found now I installed md5sha1sum (i had to unlink coreutils). Now it keep running and never gives me an output after the message: ""Checking checksums"" ",2023-08-09T16:23:16Z,llama,https://github.com/meta-llama/llama/issues/652 651,1843201913,LLaMa does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.," `pip3 install -e . Defaulting to user installation because normal site-packages is not writeable Obtaining ERROR: does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found. `",2023-08-09T13:07:39Z,llama,https://github.com/meta-llama/llama/issues/651 650,1842892348,fix line separators in download.sh for wsl2,"The original download.sh uses as line separator. On windows by using the shell script can not be executed as seen in . I replaced the line separators, thus it can be downloaded from windows by using without any problems.",2023-08-09T09:58:48Z,llama,https://github.com/meta-llama/llama/pull/650 649,1842628885,"In meta-llama/Llama-2-7b-hf i got issue after staring fine tunning is that ValueError: `do_sample` is set to `False`. However, temperature is set to 0.9 -- this flag is only used in sample-based generation modes. Set `do_sample=True` or unset temperature to continue.","2023-08-09 07 00.709891: W TF-TRT Warning: Could not find TensorRT > INFO Running LLM > INFO Params: Namespace(version=False, train=True, deploy=False, inference=False, train_split='train', valid_split=None, text_column='text', learning_rate=0.0002, num_train_epochs=3, train_batch_size=4, eval_batch_size=4, warmup_ratio=0.1, gradient_accumulation_steps=1, optimizer='adamw_torch', scheduler='linear', weight_decay=0.0, max_grad_norm=1.0, seed=42, add_eos_token=False, block_size=-1, use_peft=True, lora_r=16, lora_alpha=32, lora_dropout=0.05, training_type='generic', train_on_inputs=False, logging_steps=-1, project_name='my-chat', evaluation_strategy='epoch', save_total_limit=1, save_strategy='epoch', auto_find_batch_size=False, fp16=False, push_to_hub=True, use_int8=False, model_max_length=2048, use_int4=True, trainer='sft', target_modules=None, func=) > INFO loading dataset from csv Using pad_token, but it is not set yet. Loading checkpoint shards: 100% [01 00, Traceback (most recent call last): File line 8, in sys.exit(main()) File line 36, in main command.run() File line 409, in run train_llm(params) File line 115, in train model = AutoModelForCausalLM.from_pretrained( File line 511, in from_pretrained return model_class.from_pretrained( File line 2971, in from_pretrained model.generation_config = GenerationConfig.from_pretrained( File line 689, in from_pretrained return cls.from_dict(config_dict, **kwargs) File line 722, in from_dict config = cls(**{**config_dict, **kwargs}) File line 316, in __init__ self.validate() File line 354, in validate raise ValueError( ValueError: is set to . However, temperature is set to 0.9 -- this flag is only used in sample-based generation modes. Set or unset temperature to continue. ""Even there is no parameter in autotrain called do_sample and temperature """,2023-08-09T07:09:58Z,llama,https://github.com/meta-llama/llama/issues/649 648,1842553208,ValueError: do_sample is set to False. when Loading Llama 2 7b chat,"ValueError: do_sample is set to False. However, temperature is set to 0.9 – this flag is only used in sample-based generation modes. Set do_sample=True or unset temperature to continue. my code is: from transformers import GenerationConfig, LlamaForCausalLM, LlamaTokenizer MODEL_NAME = bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type=“nf4”, bnb_4bit_compute_dtype=torch.bfloat16, ) model = LlamaForCausalLM.from_pretrained( MODEL_NAME, device_map=“auto”, trust_remote_code=True, use_auth_token=True, temperature=0.1, do_sample=True, quantization_config=bnb_config, ) tokenizer = LlamaTokenizer.from_pretrained(MODEL_NAME) tokenizer.pad_token = tokenizer.eos_token Also I explicitly changed the do_sample=True in configuration_utils.py but didn`t worked",2023-08-09T06:07:59Z,llama,https://github.com/meta-llama/llama/issues/648 645,1842285505,model's default max_seq_len set to 512 when the model is trained to accommodate 4096 ,"Printing in : shows that it is set to 512, when the technical report says it is trained to accommodate 4096. I realized this as many responses that required a long response were getting cut off mid-sentence. the quick fix was to do the following: ",2023-08-09T00:49:09Z,llama,https://github.com/meta-llama/llama/issues/645 644,1842094295,Update readme.md,Default batch_size should be 6 in example_text_completion.py due to bsz = len(prompt_tokens) where bsz is 6.,2023-08-08T21:00:49Z,llama,https://github.com/meta-llama/llama/pull/644 643,1841359317,unknown cause please help,python: can't open file 'C [Errno 2] No such file or directory,2023-08-08T13:47:43Z,llama,https://github.com/meta-llama/llama/issues/643 642,1841090807,torchrun,"Unable to run torchrun --nnodes 1 --nproc_per_node 4 llama_finetuning.py --enable_fsdp --use_peft --peft_method lora --model_name --pure_bf16 --output_dir running on Python is 3.11.4 Do I need to use a lower version of Python say 3.7 Please your assistance. Much appreciated. Ben",2023-08-08T11:06:58Z,llama,https://github.com/meta-llama/llama/issues/642 641,1841090739,Use./download.sh to download the wrong file size,"Don't know what's wrong The details are as follows: Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 13B Downloading LICENSE and Acceptable Usage Policy Permission denied Permission denied Downloading tokenizer Permission denied Permission denied md5sum: tokenizer_checklist.chk: no properly formatted MD5 checksum lines found Downloading llama-2-13b Permission denied Permission denied Permission denied Permission denied Checking checksums md5sum: checklist.chk: no properly formatted MD5 checksum lines found sysadmin sudo bash download.sh Enter the URL from email: Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 13B Downloading LICENSE and Acceptable Usage Policy --2023-08-08 19 34-- Resolving huggingface.co (huggingface.co)... 18.164.174.17, 18.164.174.23, 18.164.174.55, ... Connecting to huggingface.co (huggingface.co)|18.164.174.17|:443... connected. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 36-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 37-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 200 OK Length: 16134 (16K) Saving to: 100%[=========================================================================>] 15.76K in 0s 2023-08-08 19 37 (219 - saved --2023-08-08 19 37-- Resolving huggingface.co (huggingface.co)... 18.164.174.118, 18.164.174.17, 18.164.174.23, ... Connecting to huggingface.co (huggingface.co)|18.164.174.118|:443... connected. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 40-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 40-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 200 OK Length: 16134 (16K) Saving to: 100%[=========================================================================>] 15.76K in 0s 2023-08-08 19 41 (127 - saved Downloading tokenizer --2023-08-08 19 41-- Resolving huggingface.co (huggingface.co)... 18.164.174.17, 18.164.174.23, 18.164.174.55, ... Connecting to huggingface.co (huggingface.co)|18.164.174.17|:443... connected. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 43-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 44-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 200 OK Length: 16134 (16K) Saving to: 100%[=========================================================================>] 15.76K in 0s 2023-08-08 19 44 (88.4 - saved --2023-08-08 19 44-- Resolving huggingface.co (huggingface.co)... 18.164.174.55, 18.164.174.118, 18.164.174.17, ... Connecting to huggingface.co (huggingface.co)|18.164.174.55|:443... connected. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 46-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 47-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 200 OK Length: 16134 (16K) Saving to: 100%[=========================================================================>] 15.76K in 0s 2023-08-08 19 47 (161 - saved md5sum: tokenizer_checklist.chk: no properly formatted MD5 checksum lines found Downloading llama-2-13b --2023-08-08 19 47-- Resolving huggingface.co (huggingface.co)... 18.164.174.23, 18.164.174.55, 18.164.174.118, ... Connecting to huggingface.co (huggingface.co)|18.164.174.23|:443... connected. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 49-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 49-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 200 OK Length: 16134 (16K) Saving to: 100%[=========================================================================>] 15.76K in 0.1s 2023-08-08 19 50 (153 - saved --2023-08-08 19 50-- Resolving huggingface.co (huggingface.co)... 18.164.174.17, 18.164.174.23, 18.164.174.55, ... Connecting to huggingface.co (huggingface.co)|18.164.174.17|:443... connected. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 52-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 52-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 200 OK Length: 16134 (16K) Saving to: 100%[=========================================================================>] 15.76K in 0s 2023-08-08 19 53 (152 - saved --2023-08-08 19 53-- Resolving huggingface.co (huggingface.co)... 18.164.174.118, 18.164.174.17, 18.164.174.23, ... Connecting to huggingface.co (huggingface.co)|18.164.174.118|:443... connected. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 55-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 55-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 200 OK Length: 16134 (16K) Saving to: 100%[=========================================================================>] 15.76K in 0s 2023-08-08 19 56 (85.7 - saved --2023-08-08 19 56-- Resolving huggingface.co (huggingface.co)... 18.164.174.55, 18.164.174.118, 18.164.174.17, ... Connecting to huggingface.co (huggingface.co)|18.164.174.55|:443... connected. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 59-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-08-08 19 59-- Reusing existing connection to huggingface.co:443. HTTP request sent, awaiting response... 200 OK Length: 16134 (16K) Saving to: 100%[=========================================================================>] 15.76K in 0.4s 2023-08-08 19 00 (44.2 - saved Checking checksums",2023-08-08T11:06:55Z,llama,https://github.com/meta-llama/llama/issues/641 640,1840550810,"Improve Tokenizer Class: Error Handling, Flexibility","**Input Validation and Error Handling:** - Added input validation checks for the and parameters in the method to ensure they are boolean values. - Enhanced error messages for better context and debugging. **Flexible Model Loading:** - Modified the constructor of the Tokenizer class to optionally accept a model path or URL. - Users can now load models from URLs or Hugging Face model identifiers, making it more versatile for different deployment scenarios. **Handling Unknown Tokens:** - Improved tokenization by handling unknown tokens using SentencePiece's . Tokens outside the vocabulary range are replaced with the token. These changes contribute to the overall reliability and usability of the Tokenizer class, enabling smoother integration into various projects.",2023-08-08T04:39:31Z,llama,https://github.com/meta-llama/llama/pull/640 638,1840340416,what are the lama v2 data mixtures exactly? , ,2023-08-07T23:34:40Z,llama,https://github.com/meta-llama/llama/issues/638 637,1840031723,Support macOS for downloading,"MacOS doesn't have an command, so we need to execute the command for each file separately. The Linux path should act the same as before.",2023-08-07T18:46:59Z,llama,https://github.com/meta-llama/llama/pull/637 636,1839512513,AWS G4dn : no GPUs found!,"I run I use AWS G4ad instance (g4dn.4xlarge) : 16 vCPU and 64GB) G4dn instances feature NVIDIA T4 GPUs and custom Intel Cascade Lake CPUs, and are optimized for machine learning inference and small scale training : But I still have: the full logs: ",2023-08-07T13:56:32Z,llama,https://github.com/meta-llama/llama/issues/636 635,1839458598,GQA for smaller models,"Hello, could we please have 13b and 7b models with the updated architecture that includes grouped query attention? A lot of people are running these models on machines with low memory and this would really help them to use a larger context. A context of 4096 just needs too much memory to be feasible right now with good speed and quality on most common hardware. Thank you!",2023-08-07T13:27:05Z,llama,https://github.com/meta-llama/llama/issues/635 634,1839324885,Training precision,"What precision is used in Llama 2 training? I heard it's , but why is the checkpoint released in ? Also, casting checkpoints to slightly degrades performance, reducing MMLU accuracy by about 0.3%",2023-08-07T12:11:58Z,llama,https://github.com/meta-llama/llama/issues/634 633,1839278482,Total download size?,What is the size of the downloads when running download.sh? Is it possible to get this documented?,2023-08-07T11:42:11Z,llama,https://github.com/meta-llama/llama/issues/633 632,1839157268,Update download.sh, ,2023-08-07T10:27:29Z,llama,https://github.com/meta-llama/llama/pull/632 631,1838625851,【llama65B】,"**预测的时候报错** RuntimeError: shape '[-1, 271]' is invalid for input of size 568 **运行指令** NCCL_SOCKET_IFNAME=eth1 NCCL_DEBUG=INFO python --stage sft --model_name_or_path --do_predict --dataset alpaca_zh --finetuning_type lora --checkpoint_dir --output_dir path_to_predict_result --per_device_train_batch_size 1 --prompt_template default --lora_target W_pack --predict_with_generate --max_samples 20 ",2023-08-07T03:53:02Z,llama,https://github.com/meta-llama/llama/issues/631 629,1838467761,What is llama2 trained on,"Hello Does llama2 provide a list of sources used for training the model.if so, where is that made available.. Is the complete code and training sources available in this github repo? Thanks",2023-08-06T23:56:17Z,llama,https://github.com/meta-llama/llama/issues/629 628,1838224189,Llama 2 model download error,"Hi, I recently tried downloading the LLama2 AI model following the instructions provided in the email I received from Meta after registration. On my initial attempt, I successfully downloaded one model. However, subsequent attempts resulted in receiving HTML content indicating an internal error, rather than the expected model files. **Steps to reproduce:** 1. 2. 3. 4. Execute 5. Paste the URL provided in the email. 6. Initiate the download. Could you please assist in resolving this? I'd appreciate any suggested workarounds or further guidance if you need more details. Thank you. ",2023-08-06T14:11:17Z,llama,https://github.com/meta-llama/llama/issues/628 627,1837954160,ImportError: cannot import name 'Literal' ,"Running $ torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b --to kenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 on ubuntu-18.04, WSL2 on Windows ",2023-08-05T22:02:57Z,llama,https://github.com/meta-llama/llama/issues/627 625,1837198033,Just received my URL and getting 403,"Hi I just got my email with the URL for downloading the models. I followed the instructions from the README: I am executing as follows: Then I got the following prompt: So, I paste the URL from the email (it starts with Then I got: and the output is: Could you help me? Thanks in advance ",2023-08-04T18:46:34Z,llama,https://github.com/meta-llama/llama/issues/625 624,1836667204,Logging & privacy during model use,"I was looking for any specific details around: 1) What happens to the data that is run through Llama 2? Is it logged or sent elsewhere? If yes what is done with that information? 2) Is there any other information (like telemetry or metadata) that is logged or sent elsewhere? If yes what is done with that information? 3) Does Meta claim any on the data is processed using Llama 2? We are excited to test out this model but it would be great to get clarification around this.",2023-08-04T12:33:10Z,llama,https://github.com/meta-llama/llama/issues/624 623,1836594597,Using a different number of GPUs by merging weights,"I was able to merge the 8 files for the 70B model into two, such that it runs on 2 (large) GPUs, in our case we have 2 of 80GB but not 8. It seems to work as well, e.g. I was able to run the model and got an excellent response where gibberish is expected if the weights are wrongly concatenated. Could this be a script or something in this repository? Is there a better or easier way to do this? ",2023-08-04T11:40:02Z,llama,https://github.com/meta-llama/llama/issues/623 621,1836123936,LLaMA-2 models to do Recommendations (explanation generation),I am looking into developing a model to make recommendations using LLaMA-2-7B as the base model. I would be grateful if I can get to know that were there some review data in the pretraining dataset.,2023-08-04T06:05:16Z,llama,https://github.com/meta-llama/llama/issues/621 620,1835693304,معدل دوران المخزون , ,2023-08-03T20:31:35Z,llama,https://github.com/meta-llama/llama/issues/620 619,1835577287,example_text_completion.py gives AssertionError: tokenizer.model,"Hi, after I downloaded the Llama-2 model weights and git cloned the llama repo, tried to run the following command on a AWS G5.12xlarge instance torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 But it gives the following error: AssertionError: tokenizer.model ERROR failed (exitcode: 1) local_rank: 0 (pid: 7424) of binary: I do not see any file named tokenizer.model anywhere in the downloaded folders. Thanks and Regards, -Hari",2023-08-03T19:03:01Z,llama,https://github.com/meta-llama/llama/issues/619 618,1835019127,debug: Trying to understand why cannot allocate in one of the layers, ,2023-08-03T12:59:36Z,llama,https://github.com/meta-llama/llama/pull/618 617,1834935964,"The client socket has timed out after 900s while trying to connect to (127.0.0.1, 29500)","Windows 10 pro Nvidia Geoforce GTX 1080 (8Go vRAM) Intel Core i7 CPU 4Ghz RAM 64 Go I run for 7b: I have this error Thank you",2023-08-03T12:09:35Z,llama,https://github.com/meta-llama/llama/issues/617 615,1834620884,"RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] error","Hi, I'm trying to use Llama tokenizer but I have trouble with this issue RuntimeError: Internal: [model_proto->ParseFromArray(serialized.data(), serialized.size())] My code is and error occurs in How can I fix it?",2023-08-03T09:10:31Z,llama,https://github.com/meta-llama/llama/issues/615 614,1834373241,There's something wrong with LLama-2-70b and huggingface transformers when I try to finetune it.,The size of tensor a (1024) must match the size of tensor b (8192) at non-singleton dimension 2,2023-08-03T06:31:23Z,llama,https://github.com/meta-llama/llama/issues/614 613,1834226158,rocm.5.4.2 AMD,"hello guys. is there any hope having support for rocm 5.4.2 amd gpus for example rx 6900 xt on ubuntu 22.04?",2023-08-03T03:56:01Z,llama,https://github.com/meta-llama/llama/issues/613 612,1834157028,"Confused about the ""hf"" meaning.","So, could any one tell me the ""hf"" mean in Llama-2-70b-hf? What's the difference between Llama-2-70b-hf and Llama-2-70b in hugging face? ""hf"" means fp16? or hugging-face-format?",2023-08-03T02:12:53Z,llama,https://github.com/meta-llama/llama/issues/612 611,1832964618,i cant install showing failed . help me out.,"failed to create process. while using this --"""""""""""" torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b --tokenizer_path tokenizer.model --max_seq_len 128 --max _batch_ """"""""""""""""""""""""""""""""""""""""""""""",2023-08-02T11:06:55Z,llama,https://github.com/meta-llama/llama/issues/611 610,1832837893,Enable resumable download for model parameters,"Had to turn off my laptop, and then figured I'd continue downloading weights later. Noticed that the default behavior was to redownload every file. By setting any existing file will be used as byte offset, and starting downloading again will continue from that offset. If the remote file changes unexpectedly, I think this could` lead to corrupted files, but I'm assuming these files are seen as static assets. True?",2023-08-02T09:48:54Z,llama,https://github.com/meta-llama/llama/pull/610 609,1832823042,Cannot download the models' weights via bash,"Dear the team, Thank you for your excellent contributions. I tried to clone the repo and run and then paste the URL to the . However, I consistently got the error: Do you have any suggestion? Thanks!",2023-08-02T09:40:47Z,llama,https://github.com/meta-llama/llama/issues/609 608,1832334412,ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 6914) of binary: /usr/bin/python3,"I tried this on colab : ! torchrun --nproc_per_node 1 example_text_completion.py ! --ckpt_dir ! --tokenizer_path tokenizer.model ! --max_seq_len 64 --max_batch_size 1 #(instead of 4) and getting following error : > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 ERROR failed (exitcode: -9) local_rank: 0 (pid: 6914) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ===================================================== example_text_completion.py FAILED ----------------------------------------------------- Failures: ----------------------------------------------------- Root Cause (first observed failure): [0]: time : 2023-08-02_02 55 host : c6666b425cdc rank : 0 (local_rank: 0) exitcode : -9 (pid: 6914) error_file: traceback : Signal 9 (SIGKILL) received by PID 6914 ===================================================== ",2023-08-02T02:50:31Z,llama,https://github.com/meta-llama/llama/issues/608 607,1832242757,Update download.sh, ,2023-08-02T01:08:13Z,llama,https://github.com/meta-llama/llama/pull/607 606,1831871003,How to self correc the model?,"I'm trying to make use of the model to customize it universal. Because training an existing model with mode data will cost computing power. As an alternative, I would like to have a setup where I prompt it for answers, if it doesn't have it prompts me with questions on missing piece. Also I'd like to prompt it with facts so that the model or the fact tree culls itself downward to truths of units.",2023-08-01T19:11:34Z,llama,https://github.com/meta-llama/llama/issues/606 605,1831270721,"Are special tokens missing from the repo's tokenizer? (B_INST, E_INST, B_SYS, E_SYS)","In some special tokens are given for inference: But these are not included in the tokenizer from the repo, nor are they added to the tokenizer in the example inferencing code or tokenizer.py. Also, in the llama-recipes repo, there is a comment about making sure your tokenizer supports adding the INST ""tokens"", but again, inspecting the tokenizer that is used by the script, at the point it is used, shows that the tokens aren't in the vocab ( I guess I'm forced to assume that the tokenizer used to pretrain the Llama-2s included these special tokens (other special tokens are confirmed to be in use after all, i.e. ""A special token is utilized to separate the prompt and answer segments."" from the Llama-2 paper). So, I think that I should add them to my tokenizer myself, but confirmation would be appreciated. As this affects both inferencing and finetuning, this is an important thing to know for sure. Thanks.",2023-08-01T13:18:43Z,llama,https://github.com/meta-llama/llama/issues/605 603,1830922838,closing signal SIGTERM,"I use AWS c5.4xlarge (32G RAM and 16 vCPU) instance. 7B work fine. But 13B and 13B-chat I have a problem. When I run: I have this error: Thank you ",2023-08-01T10:09:56Z,llama,https://github.com/meta-llama/llama/issues/603 602,1830830046,md5sum: WARNING: 156 of 156 computed checksums did NOT match,"I'm trying to download LLAMA2 model into my local machine windows11, downloaded download.sh file from here into my pwd. After running bash download.sh and entering mail in my cmd. tokenizer, tokenizer checklist, license and use_policies are downloaded and after that I'm getting this error.",2023-08-01T09:20:45Z,llama,https://github.com/meta-llama/llama/issues/602 601,1830759947,How to stream the result?,"The chat_completion() returns the whole result text in sync way. Please help. To make it more human like, is there anyway to get the streamy tokens? I guess the key point is under generate() in class Llama, but I don't know much about torch.",2023-08-01T08:42:13Z,llama,https://github.com/meta-llama/llama/issues/601 600,1830486249,i tried to run the code in windows 11 but showing this error. please give me some ideas to run in locally,"Downloading LICENSE and Acceptable Usage Policy line 17: wget: command not found line 18: wget: command not found Downloading tokenizer line 21: wget: command not found line 22: wget: command not found md5sum: tokenizer_checklist.chk: No such file or directory Downloading llama-2-7b line 52: wget: command not found line 55: wget: command not found line 56: wget: command not found Checking checksums md5sum: checklist.chk: No such file or directory",2023-08-01T05:21:20Z,llama,https://github.com/meta-llama/llama/issues/600 598,1830287148,md5sum: checklist.chk: no properly formatted MD5 checksum lines found,"Getting the following error while downloading any model. I was able to download the models last week with no issue ",2023-08-01T01:07:37Z,llama,https://github.com/meta-llama/llama/issues/598 597,1829347410,Unable to establish SSL connection.," ",2023-07-31T14:43:02Z,llama,https://github.com/meta-llama/llama/issues/597 596,1829319712,How to prompt llama to do multiple choice questions for benchmarking?,"Hello, I'm trying to benchmark llama (and some llama-based models) with a range of question-answer datasets. A question consists of a question and several choices. Currently, my prompt is similar to this format: The generation result is like Is it possible to have the model output a singe choice?",2023-07-31T14:31:16Z,llama,https://github.com/meta-llama/llama/issues/596 595,1829261019,ModuleNotFoundError: No module named 'fairscale',"Macbook pro: 2,3 GHz Intel Core i7 - 4 cores 16 Go 3733 MHz LPDDR4X Intel Iris Plus Graphics 1536 Mo All requirements are installed: When I run I have: Thank you",2023-07-31T14:00:53Z,llama,https://github.com/meta-llama/llama/issues/595 594,1828710327,How to load llama2-70b-chat with only 4 GPUs(A6000 ada 48GB),"The default llama2-70b-chat is sharded into 8 pths with MP=8, but I only have 4 GPUs and 192GB GPU mem. Is there any way to reshard the 8 pths into 4 pths? So that I can load the state_dict for inference.",2023-07-31T08:53:26Z,llama,https://github.com/meta-llama/llama/issues/594 593,1828705808,Config.json Error,"When I try to use it with the following code: `from langchain.llms import HuggingFaceHub google_kwargs = {'temperature':0.6, 'max_length': 64} llm = huggingfacehub_api_token=hugging_face_token, model_kwargs=google_kwargs) name = llm('I want to open an Italian restaurant, suggest me a name for this') print(name)` It returns me this error: Why this?",2023-07-31T08:50:36Z,llama,https://github.com/meta-llama/llama/issues/593 592,1828645362,Redirects are currently not supported in Windows or MacOs.,"# I am using Anaconda Environment on Windows, GPU-enabled Pytorch. I tried modifying line 62 in to but still doesn't work. Error message starts here: (pytorch) PS torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 NOTE: Redirects are currently not supported in Windows or MacOs. [W C 601] [c10d] The client socket has failed to connect to [Yinhao]:29500 (system error: 10049 - The requested address is not valid in its context.). [W C 601] [c10d] The client socket has failed to connect to [Yinhao]:29500 (system error: 10049 - The requested address is not valid in its context.). [W C 601] [c10d] The client socket has failed to connect to [Yinhao]:29500 (system error: 10049 - The requested address is not valid in its context.). [W C 601] [c10d] The client socket has failed to connect to [Yinhao]:29500 (system error: 10049 - The requested address is not valid in its context.). Traceback (most recent call last): File line 55, in torch.distributed.init_process_group(""nccl"") # original File line 907, in init_process_group default_pg = _new_process_group_helper( File line 1013, in _new_process_group_helper raise RuntimeError(""Distributed package doesn't have NCCL "" ""built in"") RuntimeError: Distributed package doesn't have NCCL built in ERROR failed (exitcode: 1) local_rank: 0 (pid: 31372) of binary: Traceback (most recent call last): File line 197, in _run_module_as_main return _run_code(code, main_globals, None, File line 87, in _run_code exec(code, run_globals) File line 7, in File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-07-31_17 37 host : Yinhao.home rank : 0 (local_rank: 0) exitcode : 1 (pid: 31372) error_file: traceback : To enable traceback see: ============================================================ ",2023-07-31T08:12:02Z,llama,https://github.com/meta-llama/llama/issues/592 591,1828338560,Question about llama2 Accuracy in paper with the measured,"Hi, We find the evaluation accuracy (BoolQ, PIQA, HellaSwag, WinoGrande, results are different between the data from llama2 paper and here or the ones we measured withhere For example, llama2 paper shows HellaSwag acc 77.2 for llama2 7b but here shows 78.6; Paper shows ARC-c acc is 45.9, while we measured withhere it is 43.43. Thus, may I ask if it is a way to understand these differences? (maybe come from the difference of how we benchmark the accuracy? And which is the way that the paper is used?) Thanks! ",2023-07-31T03:36:14Z,llama,https://github.com/meta-llama/llama/issues/591 590,1827970602,When will the Llama 2 34B model be released?, ,2023-07-30T15:15:40Z,llama,https://github.com/meta-llama/llama/issues/590 589,1827842500,Update download.sh, ,2023-07-30T07:53:54Z,llama,https://github.com/meta-llama/llama/pull/589 588,1827842116,https://download.llamameta.net/*?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoiRz9cdTAwMDI%7EfT9zXG4iLCJSZXNvdXJjZSI6Imh0dHBzOlwvXC9kb3dubG9hZC5sbGFtYW1ldGEubmV0XC8qIiwiQ29uZGl0aW9uIjp7IkRhdGVMZXNzVGhhbiI6eyJBV1M6RXBvY2hUaW1lIjoxNjkwNzg5MzgzfX19XX0_&Signature=a8Z9BfiLoK1Kw3BfMz95NAzlLjiLO8mHcqvPUm9tB8mfMolym4wos7CR6sN13hOvKhclXnQ%7E2Sh%7E6NzLWCALo1gBnmICyXsiaEIG0bh%7EcRps9I96wWf89mKMsyTo3VnafZxge9mbXQ1enD2VFtpg%7EdVN38SQNMolX-tbjextWbNmJu3Un8E8S8u394Wo%7EFQj5GXKLzHB55F3Ty6Aw4uBQ%7ELcvsSZRS5Ma3o-6lkqO3bQMd6PjV7d%7E4wfD6f0a6bdtPZK-4T-jAH-acPOYEkWGC5NZ486ESF7-Uk1iOnnwFqxqMqhZVZZH4EWPtMasn4SbLOa1Q75HigE5bvCCyK8Yw__&Key-Pair-Id=K15QRJLYKIFSLZUpdate download.sh, ,2023-07-30T07:52:09Z,llama,https://github.com/meta-llama/llama/pull/588 587,1827549998,Where is the download script?,"The readme says in relevant part: > Once your request is approved, you will receive a signed URL over email. Then run the download.sh script, passing the URL provided when prompted to start the download. Maybe it's just me, but I see **nothing** about _where_ to find this script _to begin with_. I decided to just guess: But of course that didn't work. Evidently other people had no issues here, so, can anyone help me out here? SOON, before my 24 hours expire later today? Thanks.",2023-07-29T16:35:45Z,llama,https://github.com/meta-llama/llama/issues/587 586,1827497225,llama-2-70b-cht-hf. git lfs. cloning issue,"run > git clone I have plenty of space for this model. git clone Cloning into 'Llama-2-70b-chat-hf'... remote: Enumerating objects: 78, done. remote: Counting objects: 100% done. remote: Compressing objects: 100% done. remote: Total 78 (delta 19), reused 0 (delta 0), pack-reused 24 Unpacking objects: 100% 504.52 KiB | 2.56 done. Filtering content: 100% 32.96 GiB | 5.44 done. Encountered 28 files that may not have been copied correctly on Windows: model-00001-of-00015.safetensors pytorch_model-00011-of-00015.bin pytorch_model-00001-of-00015.bin model-00011-of-00015.safetensors model-00007-of-00015.safetensors pytorch_model-00007-of-00015.bin model-00003-of-00015.safetensors pytorch_model-00003-of-00015.bin pytorch_model-00010-of-00015.bin pytorch_model-00006-of-00015.bin pytorch_model-00005-of-00015.bin pytorch_model-00009-of-00015.bin pytorch_model-00013-of-00015.bin model-00005-of-00015.safetensors model-00009-of-00015.safetensors pytorch_model-00002-of-00015.bin model-00006-of-00015.safetensors model-00013-of-00015.safetensors model-00010-of-00015.safetensors model-00002-of-00015.safetensors pytorch_model-00004-of-00015.bin pytorch_model-00008-of-00015.bin pytorch_model-00012-of-00015.bin model-00012-of-00015.safetensors pytorch_model-00014-of-00015.bin model-00014-of-00015.safetensors model-00004-of-00015.safetensors model-00008-of-00015.safetensors ",2023-07-29T14:05:17Z,llama,https://github.com/meta-llama/llama/issues/586 585,1827429562,Incorrect and inconsistent results,"Why does the same code on re-running return something totally incorrect and often times return texts besides json that seems completely out of context: {text} It returned the following text (note that text on Spacy at the end of json string is also generated by the model. Below is the result I get the second time I run it: Could someone help understand why this is happening? What parameters am I setting wrong? Or is the prompt incorrect? Many thanks! sbs ",2023-07-29T10:06:53Z,llama,https://github.com/meta-llama/llama/issues/585 584,1827419884,ERROR 403: Forbidden.,"--2023-07-29 14 14-- Resolving download.llamameta.net (download.llamameta.net)... 198.18.2.46 Connecting to download.llamameta.net (download.llamameta.net)|198.18.2.46|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2023-07-29 14 14 ERROR 403: Forbidden. --2023-07-29 14 14-- Resolving download.llamameta.net (download.llamameta.net)... 198.18.2.46 Connecting to download.llamameta.net (download.llamameta.net)|198.18.2.46|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2023-07-29 14 15 ERROR 403: Forbidden. --2023-07-29 14 15-- Resolving download.llamameta.net (download.llamameta.net)... 198.18.2.46 Connecting to download.llamameta.net (download.llamameta.net)|198.18.2.46|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2023-07-29 14 15 ERROR 403: Forbidden. --2023-07-29 14 15-- Resolving download.llamameta.net (download.llamameta.net)... 198.18.2.46 Connecting to download.llamameta.net (download.llamameta.net)|198.18.2.46|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2023-07-29 14 15 ERROR 403: Forbidden. Checking checksums",2023-07-29T09:36:29Z,llama,https://github.com/meta-llama/llama/issues/584 583,1827308090,Can't compute the logprobs of generated tokens,"When I set to True, I found that the model's output logits are like: [-0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0] I carefully checked the source code and found that each target token is but not the last generated token. Is that correct, or do I miss anything? If it is a bug, I think just moving Line 137-143 after Line 155 can fix this bug.",2023-07-29T04:13:36Z,llama,https://github.com/meta-llama/llama/issues/583 582,1827282422,No permission to run ./download.sh,"When I run Mac shows that: (llama2) richardxu llama % zsh: permission denied: it never reminds me to paste URL",2023-07-29T02:53:27Z,llama,https://github.com/meta-llama/llama/issues/582 581,1826957295,I can not download,When I execute the file in theory the download starts but the empty folder is created.,2023-07-28T19:05:02Z,llama,https://github.com/meta-llama/llama/issues/581 580,1826851806,Does Llama 2 trained on current data? When was the Llama2 models trained?, ,2023-07-28T18:03:53Z,llama,https://github.com/meta-llama/llama/issues/580 579,1826779372,"Llama2 Taking Too Long to Generate Text, How can I make it use GPU instead of CPU? pipe.to(""cuda"") doesn't work with TextGenerationPipeline?","Here is the method I am referring to in my code: def generate_story(scenario): tokenizer = model = pipeline = transformers.pipeline( ""text-generation"", #task model=model, tokenizer=tokenizer, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map=""auto"", max_length=1000, do_sample=True, top_k=10, ) template = """""" You are an expert writer; You can generate a script for a short animation that is informative, fun, entertaining, and is made for kids. Do not start the script with words or phrases like ""Once upon a time"" or ""Long ago in a land far away"", the story should be no more than 750 words. Make the script well written even using emotions by surrounding them with brackets, such as [laughs], [cries], etc. Make the script last long enough for a 5 minute video. Return nothing but the story. Leave out any extra words that have nothing to do with the story. CONTEXT: {scenario} STORY: """""" llm = HuggingFacePipeline(pipeline=pipeline,model_kwargs={'temperature':0}) prompt = PromptTemplate(template=template, input_variables=[""scenario""]) llm_chain = LLMChain(prompt=prompt, llm=llm) return llm_chain.run(scenario)",2023-07-28T17:13:44Z,llama,https://github.com/meta-llama/llama/issues/579 578,1826623750,llama2 - loss declines too slowly,"Hi everyone, I am fine-tuning the llama2, but the loss is declining very slowly, and I am a little confused about the reason. Prior to this, I had fine-tuned the llama1 and the loss dropped significantly at that time The picture below is the loss decline curve I was fine-tuning the llama2. I hope scholars who have similar problems and know how to solve them can give me some suggestions. Thanks!!!!!! ",2023-07-28T15:22:09Z,llama,https://github.com/meta-llama/llama/issues/578 577,1826590826,AssertionError: no checkpoint files found in llama-2-7b,"Hello, I'm trying to run llama2-7b text completion and I get the below error. All model downloads were successful Any help will be appreciated.",2023-07-28T15:05:42Z,llama,https://github.com/meta-llama/llama/issues/577 576,1826465675,md5sum: checklist.chk: no properly formatted MD5 checksum lines found,"Hi, I checked previous issues, and I try these and still didn't manage to download the model. - I double check the the link, I even click it and then copy it from the URL. (it's starts with - I delete the file and clone the repo again and asked for a new link. - I try to download each model separately but still could not. I am geussing this might be the issue, I get this line in the middle and also every file download is stopped at the beginning like this: ",2023-07-28T13:48:09Z,llama,https://github.com/meta-llama/llama/issues/576 575,1826108135,ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.,"from transformers import AutoTokenizer, AutoModelForCausalLM model_path = tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False) I want to load Llama model. I am facing above issue.",2023-07-28T09:48:26Z,llama,https://github.com/meta-llama/llama/issues/575 573,1825877341,link to download Llama 2,"Hi, I try to begin with Llama 2 and download it. I got an e mail with the following : 1. Visit the Llama repository in GitHub and follow the instructions in the [README] to run the download.sh script. 2. When asked for your unique custom URL, please insert the following: 3. Select which model weights to download But I dont find the readme link mentionned in that e mail. When I click on the link, there is no field to enter the url. Could someone help me ? ",2023-07-28T07:35:36Z,llama,https://github.com/meta-llama/llama/issues/573 572,1825855039,"what is meaning of max_seq_len: int = 512, max_gen_len: Optional[int] = None? please can anybody give me the answer? ",max_seq_len is it about combined length of input context text and output text? ,2023-07-28T07:18:02Z,llama,https://github.com/meta-llama/llama/issues/572 571,1825671781,"Can't run torchrun example because ""No module named 'llama'""","First, I don't have GPU. Just CPU. I installed pytorch: I installed all requeirements that are in the txt file. I ran: and all requirements are satisfied. BUT, when I run the 7B example this is what happened: But, if I run: So, I don't know where is my problem. Please help. ",2023-07-28T04:16:30Z,llama,https://github.com/meta-llama/llama/issues/571 570,1825660757,convert hf llama2 weight to meta llama2 weight ,"We get a llama2-70B model, which was fine-tuned from huggingface model, and save as huggingface format in one file. We want to convert this model to meta llama2 weight . Then we can use this repo to run my model. There is a script for converting llama weights to hf: Is there a script for this reverse converting (conver hf weights to meta llama)? ",2023-07-28T03:59:51Z,llama,https://github.com/meta-llama/llama/issues/570 569,1825614732,About the dataset,"hello! I read your paper (llama and llama2). I noticed that you set different sample rates and training epochs for different data. But there is no explanation in the article about how these numbers are set. So I would like to ask you how these settings are calculated.",2023-07-28T03:00:55Z,llama,https://github.com/meta-llama/llama/issues/569 568,1825569184,Change license so Using Llama output to fine tune Galactica is ok?,"For research purposes, it would be great to use Llama2 70B like ChatGPT api to generate data for fine tuning Galactica. To my understanding, it is only allowed to use output of llama2 to fine tune other llama2. Wouldn't it make sense to change the license so one can do this for research purposes on your own model, Galactica, too? Otherwise, I just use Falcon-40b. I want to try to generate data for RLAIF with autocrit ( to make the model reason such that it sticks to the truth better. ",2023-07-28T02:08:06Z,llama,https://github.com/meta-llama/llama/issues/568 567,1825074241,Fixed invalid bash for '*' string replacement, ,2023-07-27T19:38:12Z,llama,https://github.com/meta-llama/llama/pull/567 566,1824651716,Feasibility of using Llama2 LLM on AWS EC2 G4dn.8xLarge and Inferentia 2.8xlarge Instances,"Hi all, Is it possible to do inference on the aforementioned machines as we are facing so many issues in Inf2 with Falcon model? Context: We are facing issues while using on the Inf2.8xl machine. We were able to run the same experiment on G5.8xl instance successfully but we are observing that the same code is not working on Inf2 machine instance. We are aware that it has Accelerator instead of NVIDIA GPU. Hence we tried the neuron-core's capability and added required helper code for using the capability of neuron-cores of the instance by using the torch-neuronx library. The code changes and respective error screenshots are provided below for your reference: Code without any torch-neuronx usage - Generation code snippet: generation_output = model.generate( input_ids = input_ids, attention_mask = attention_mask, generation_config = generation_config, return_dict_in_generate = True, output_scores = False, max_new_tokens = max_new_tokens, early_stopping = True ) #print(""generation_output"") #print(generation_output) s = generation_output.sequences[0] output = tokenizer.decode(s) Code using torch-neuronx - helper function code snippet: def generate_sample_inputs(tokenizer, sequence_length): dummy_input = ""dummy"" embeddings = tokenizer(dummy_input, max_length=sequence_length, padding=""max_length"",return_tensors=""pt"") return tuple(embeddings.values()) def compile_model_inf2(model, tokenizer, sequence_length, num_neuron_cores): use only one neuron core os.environ[""NEURON_RT_NUM_CORES""] = str(num_neuron_cores) import torch_neuronx payload = generate_sample_inputs(tokenizer, sequence_length) return torch_neuronx.trace(model, payload) model = compile_model_inf2(model, tokenizer, sequence_length=512, num_neuron_cores=1) Can this github issue address our specific problems mentioned above? My queries are basically: 1. Can we try Llama 2 on G4dn.8xLarge and Inferentia 2.8xlarge instances or it is not supported yet? If not, which machine instance we should try considering cost-effectiveness? 2. Is it feasible to do inference with Falcon on Inf2 or should we go for G4dn.8xlarge as we are facing so many issues in Inf2?",2023-07-27T15:44:16Z,llama,https://github.com/meta-llama/llama/issues/566 565,1824544396,loading llama-2-7b on multiple GPUs,"Hi, I have 4 GPUs of 11GB each, Is it possible to load the model parallelly? Running: torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 I get memory error: RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with to enable device-side assertions. ERROR failed (exitcode: 1) local_rank: 0 (pid: 19441) of binary: Trying: torchrun --nproc_per_node 4 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 I get : ^^AssertionError^: ^Loading a checkpoint for MP=1 but world size is 4^ Any solution for that? ",2023-07-27T14:49:04Z,llama,https://github.com/meta-llama/llama/issues/565 563,1824255160,Output includes input,"Here's the code that I'm running The result: Why is it that output always includes input prompt? Am I missing some special token? Thanks!",2023-07-27T12:19:10Z,llama,https://github.com/meta-llama/llama/issues/563 561,1824170373,Hrdware requirements to run 13B and 30B smoothly,I am looking to build a pc which will be able to run these LLMs smoothly and also finetune them. No budget constraints. Please recommend a good build.,2023-07-27T11:25:40Z,llama,https://github.com/meta-llama/llama/issues/561 560,1823996244,LLAMA-2 Finetune,"Hello, I have done fine-tuning using model. After completing the training, I called the trainer.save_model(“trained-model”) but this line is not store model on local disk. Can someone please let me on this issue? Thanks, Sani",2023-07-27T09:37:02Z,llama,https://github.com/meta-llama/llama/issues/560 559,1823868082,Dove, ,2023-07-27T08:24:56Z,llama,https://github.com/meta-llama/llama/pull/559 558,1823420936,Any tutorials out there on how to quantize llama2 models?,Thank you!,2023-07-27T00:55:00Z,llama,https://github.com/meta-llama/llama/issues/558 557,1823200665,Change download directory,"Because the models are too large, can I download them to an external hard drive usb 3.1 making adjustments to the download.sa? How can I retrieve the models from the external drive when I want to use the models? Thank you!",2023-07-26T21:23:17Z,llama,https://github.com/meta-llama/llama/issues/557 556,1823048936,Llama 2 70B weights with model-parallel (MP) = 4,There is any way to convert the 70B model to run on only 4x H100s? The memory utilization for 8x H100 is around 154GB not sure if there is enough space left (6GB) for running it on 4 GPUs,2023-07-26T19:42:48Z,llama,https://github.com/meta-llama/llama/issues/556 555,1822852977,OSError when trying LLama-2 in HuggingFace Pipeline?,"When I try ` pipe = pipeline(""text-generation"", OSError: is not a local folder and is not a valid model identifier listed on ' If this is a private repository, make sure to pass a token having permission to this repo with or log in with and pass . ` I have access to LLama-2 and have also logged in using huggingface-cli. Not sure what the problem is.",2023-07-26T17:26:03Z,llama,https://github.com/meta-llama/llama/issues/555 554,1822234012,Cant download the form isn't working properly ,when i fill in the form it just wont allow it to be sent ,2023-07-26T11:41:13Z,llama,https://github.com/meta-llama/llama/issues/554 553,1822092090,Anyone tried running Llama 2 through Amazon SageMaker Jumpstart?, ,2023-07-26T10:26:23Z,llama,https://github.com/meta-llama/llama/issues/553 552,1821968419,"Add ""-c"" flag to indicate ""wget"" to continue download","Since these models are pretty big in size re-running the download script starts download all over again from the start. That's most likely not expected behavior. Setting **-c** flag in **wget** will continue the download from the point it was interrupted. In my case I resumed download that was interrupted when ~3 GB were already downloaded. ",2023-07-26T09:28:38Z,llama,https://github.com/meta-llama/llama/pull/552 551,1821695551,AssertionError: Loading a checkpoint for MP=1 but world size is 4,"I tried to run llama-2-7b on 4 GPUs by running torchrun --nproc_per_node 4 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 1 But I got the error: AssertionError: Loading a checkpoint for MP=1 but world size is 4",2023-07-26T06:43:22Z,llama,https://github.com/meta-llama/llama/issues/551 550,1821523202,Any instruction for finetuning LLama model in my private dataset?, ,2023-07-26T03:45:01Z,llama,https://github.com/meta-llama/llama/issues/550 549,1821475537,wget continue download. progress: so that It won't make the previous log invisible,continue. progress: so that It won't make the previous log invisible,2023-07-26T02:40:30Z,llama,https://github.com/meta-llama/llama/pull/549 548,1821450333,70b-hf weird performance,"The following code works fine on two A100s with . However, the generated output is extremely weird, where the model keeps repeating ""I'm sorry, I'm sorry"". Does anyone have any idea about the potential reasons?",2023-07-26T02:05:57Z,llama,https://github.com/meta-llama/llama/issues/548 547,1821439632,May I ask if the download script supports breakpoint continuation, ,2023-07-26T01:51:56Z,llama,https://github.com/meta-llama/llama/issues/547 546,1821329581,how to use my downloaded model locally on my macos,"forgive me if the question is naive, im new. I tried to use langchain to access to the model that I have downloaded by running the but I dont know how to access to the model, all instructions online are about using accessing the LLama using HuggingFace, I dont want to use HuggingFace, I want to use my local downloaded model, what should i do, thanks.",2023-07-25T23:39:40Z,llama,https://github.com/meta-llama/llama/issues/546 545,1821320482,how to fine tune llama on biomedical abstractive summarization,"Hi I'm a NLP researcher, how can I fine tune llama for abstractive summarization in a special domain like biomedical , any help please???",2023-07-25T23:32:00Z,llama,https://github.com/meta-llama/llama/issues/545 544,1821257236,"ERROR:gpu_init.cc(523)] Passthrough is not supported, GL is disabled, etc.","Hi there! Sorry to bother you with this one. I've received the URI email, and in attempting to round download.sh have repeatedly encountered this error, [22860 ERROR:gpu_init.cc(523)] Passthrough is not supported, GL is disabled, ANGLE is which terminates download.sh with: [main 2023-07-25T22 40.798Z] Extension host with pid 24696 exited with code: 0, signal: unknown. I've removed and re-cloned the repo to no avail. Wondering if anyone has any guidance for eliminating this little obstacle? And thanks!",2023-07-25T22:26:56Z,llama,https://github.com/meta-llama/llama/issues/544 543,1821184004,Access to SFT dataset or LLaMA2 SFT models,"Hi authors, First of all, thanks for your great work on LLaMA-2! This is an impressive work for open source large language models! I have a question about section 3.1 in the paper, specifically ""Quality is all you need"" section. It mentions that when instruction tuning the base model, you first select 27,540 high quality data examples. Is it possible that you can open source these selected data or the supervised finetuned model, which does not include RLHF? Thanks!",2023-07-25T21:23:30Z,llama,https://github.com/meta-llama/llama/issues/543 542,1821109410,Can't get access to llama-2 models,"Hi all, sorry to open this as an issue; I don't see other ways to diagnose the problem. I've filled the llama-2 form for okhattab on the day of release (and then again since, and the same for llama-1 recently) and I can't seem to get access to the models—or to get any other communication for that matter. I see plenty of seemingly automatic approvals. Anything I can do to facilitate access?",2023-07-25T20:37:56Z,llama,https://github.com/meta-llama/llama/issues/542 541,1821057990,What are your checksum values? I'm curious to track the evolution of the weights people are dl'ing.,"Since the checksums aren't checked into this repo, but are sent along with the model weights, it's difficult to understand if we're getting different weights per person, or to understand if and when the model weights are updated. It would be convenient if the checksums were checked into the repo, and versioned. Then we'd also have the benefit of git history as well.",2023-07-25T20:05:15Z,llama,https://github.com/meta-llama/llama/issues/541 540,1821012076,4 Bit Inference of LLaMA-2-70B,"Has anyone been able to get the LLaMA-2 70B model to run inference in 4-bit quantization using HuggingFace? Here are some variations of code that I've tried based on various guides: `python3 name = # I've also tried vanilla tokenizer = AutoTokenizer.from_pretrained(name) tokenizer.pad_token_id = tokenizer.eos_token_id # for open-ended generation model = AutoModelForCausalLM.from_pretrained( name, torch_dtype=torch.float16, load_in_4bit=True, # changing this to load_in_8bit=True works on smaller models trust_remote_code=True, device_map=""auto"", # finds GPU ) generation_pipe = pipeline( ""text-generation"", model=model, tokenizer=tokenizer, trust_remote_code=True, device_map=""auto"", # finds GPU ) When running all of these variations, I am able to load the model on a 48GB GPU, but making the following call produces an error: `python3 text = ""any text"" response = generation_pipe( text, max_length=128, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, ) RuntimeError: shape '[1, 410, 64, 128]' is invalid for input of size 419840 ` What am I doing wrong? Is this even possible? Has anyone been able to get this 4-bit quantization working?",2023-07-25T19:39:10Z,llama,https://github.com/meta-llama/llama/issues/540 539,1820895267,403 Forbidden,"I try with every way possible and request the link 3 times and still cant download the model and getting this error massage even if I am sure that I do everything right : Reusing existing connection to download.llamameta.net:443. HTTP request sent, awaiting response... 403 Forbidden 2023-07-25 21 39 ERROR 403: Forbidden.",2023-07-25T18:22:36Z,llama,https://github.com/meta-llama/llama/issues/539 538,1820609247,help me ," File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File line 20, in main generator = Llama.build( ^^^^^^^^^^^^ File line 62, in build torch.distributed.init_process_group(""nccl"") File line 900, in init_process_group store, rank, world_size = next(rendezvous_iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^ File line 235, in _env_rendezvous_handler rank = int(_get_env_or_raise(""RANK"")) ^^^^^^^^^^^^^^^^^^^^^^^^^ File line 220, in _get_env_or_raise raise _env_error(env_var) ValueError: Error initializing torch.distributed using rendezvous: environment variable RANK expected, but not set",2023-07-25T15:35:17Z,llama,https://github.com/meta-llama/llama/issues/538 537,1820536040,我想找一个女朋友,怎么找, ,2023-07-25T14:58:58Z,llama,https://github.com/meta-llama/llama/issues/537 536,1820225505,fix file permissions (download.sh),"Now ""download.sh"" can be run without changing file permissions",2023-07-25T12:28:38Z,llama,https://github.com/meta-llama/llama/pull/536 535,1820087573,Don't redownload model files if already there,"I was trying to download the 70B-chat model without knowing it's size on disk, then ended up not having enough disk space. After moving it to an external SSD and upon relaunching the download script, it seems that it is redownloading (consolidated.xx) files that are already there. It would be nice to avoid this by using some kind of checksum to see if file is already here without having to redownload. This would also reduce server bandwidth",2023-07-25T11:04:49Z,llama,https://github.com/meta-llama/llama/issues/535 534,1819956418,How To Run Llama 2 without a gpu?,Is there any way to run without gpu or with an integrated graphics card?,2023-07-25T09:54:01Z,llama,https://github.com/meta-llama/llama/issues/534 533,1819817833,Old access form vs new access form,"I submitted a request for access to the llm model last month. but i somehow missed the email. I wanted to first of all understand if the models are the same? and how long will it take to get the new access link from the new form? ",2023-07-25T08:31:23Z,llama,https://github.com/meta-llama/llama/issues/533 532,1819723119,Use without torchrun,"I am able to run this model with torchrun. But I want to use in a python script where I can load the model and based upon the question I used to get response. while loading model like below i am getting error: testun.py script: from llama import Llama generator = Llama.build( ckpt_dir='llama-2-13b-chat', tokenizer_path='tokenizer.model', max_seq_len=512, max_batch_size=4, ) Please help on that ",2023-07-25T07:28:10Z,llama,https://github.com/meta-llama/llama/issues/532 531,1819677978,How to solve it?,"torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Traceback (most recent call last): File ""example_text_completion.py"", line 55, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""example_text_completion.py"", line 18, in main generator = Llama.build( File line 93, in build tokenizer = Tokenizer(model_path=tokenizer_path) File line 18, in __init__ self.sp_model = SentencePieceProcessor(model_file=model_path) File line 447, in Init self.Load(model_file=model_file, model_proto=model_proto) File line 905, in Load return self.LoadFromFile(model_file) File line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: unk is not defined. ERROR failed (exitcode: 1) local_rank: 0 (pid: 3596437) of binary: Traceback (most recent call last): File line 33, in sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-07-25_14 19 host : suanligpu rank : 0 (local_rank: 0) exitcode : 1 (pid: 3596437) error_file: traceback : To enable traceback see: ============================================================ I checked many methods to solve this problem. ",2023-07-25T06:55:49Z,llama,https://github.com/meta-llama/llama/issues/531 530,1819644271,May I ask how much hard drive capacity is required to download all models, ,2023-07-25T06:32:54Z,llama,https://github.com/meta-llama/llama/issues/530 529,1819642281,Is interrupting the download process considered a one-time download opportunity?,"Excuse me, if I download the model halfway and the network is interrupted, will it count as a download and a waste of opportunity",2023-07-25T06:31:04Z,llama,https://github.com/meta-llama/llama/issues/529 528,1819628343,What is inside llama-2-70b consolidated.00.pth file and how do I read it?,"I tried to print out the contents of the file using below lines of code. what it struck me was the content after printing like that, the size of the file is around 317kb where as the consolidated.00.pth file is close to 17.25gb. weights.txt Are there other contents in the file ? how do I see it? Attached the exported content for reference. ",2023-07-25T06:18:35Z,llama,https://github.com/meta-llama/llama/issues/528 527,1819557272,Unabel to run models on Windows,"I have completed till the python installation after that when I am trying to execute pretrained-model then getting below error. Any idea how to fix this and run the models? I am using **Windows 10** machine and have **Python 3.9.13**. After that when I try to run with _python -m torch.distributed.run_ instead of _torchrun_ then getting below error. ",2023-07-25T05:02:14Z,llama,https://github.com/meta-llama/llama/issues/527 525,1819243851,"no llama library instead ""llamatest""","not sure if this is just a me problem, but for some reason theres no proper llama library. i installed both 7B and 7B-chat and this is what i get: I used download.sh and attempted to use setup.py but got this error: i dont want to say this is an issue on the developers part but any help is appreciated here. FIXED THIS I WAS BEING DUMB CANT FIGURE OUT HOW TO DELETE SORRY ",2023-07-24T22:32:23Z,llama,https://github.com/meta-llama/llama/issues/525 524,1819183547,Error: Checking checksums md5sum: checklist.chk: no properly formatted checksum lines found,"I tried several Models, but it always doesn't work. I got a new Download link, I tried to update my WSL, I updated md5sum. Nothing works. What can I do now?",2023-07-24T21:36:34Z,llama,https://github.com/meta-llama/llama/issues/524 522,1818618812,add: python download script for cross pantform, ,2023-07-24T15:08:21Z,llama,https://github.com/meta-llama/llama/pull/522 521,1818428856,pretrain from scratch,"Hi How can i pretrain LlaMa from scratch in an another language?",2023-07-24T13:27:03Z,llama,https://github.com/meta-llama/llama/issues/521 520,1818373806,Update download.sh, ,2023-07-24T12:56:15Z,llama,https://github.com/meta-llama/llama/pull/520 519,1818322226,How can I resume the download of Path 7 for the Llama 70B model without starting the entire download process from the beginning after the 24-hour time limit expired?,"How can I resume the download of Path 7 for the Llama 70B model without starting the entire download process from the beginning after the 24-hour time limit expired?""",2023-07-24T12:26:06Z,llama,https://github.com/meta-llama/llama/issues/519 518,1817959814,how to calculate word embeddings like openai?,Is there any way to create embeddings using LLMA2 as the base model?,2023-07-24T08:57:45Z,llama,https://github.com/meta-llama/llama/issues/518 517,1817755086,some questions about training of Llama2,"1. In the program of '*-hf', why do you use the type of float16 instead of bf16? 2. Does '*-hf' mean half precision, why are the model sizes of Llama-2-7b-hf and Llama-2-7b the same?",2023-07-24T06:46:15Z,llama,https://github.com/meta-llama/llama/issues/517 516,1817123144,"""RuntimeError: CUDA error: unknown error"" troubleshooting","Hi. I'm trying to figure out how to troubleshoot this generic error message i get from running the example locally in my machine. I suspect either the PyTorch or Cuda version is wrong. Or my hardware is insufficient. How do I determine what the issue is exactly? Im running the project from docker with GPU and virtualization enabled Docker Images I've tried: docker pull docker pull 64GB RAM OS Windows 11 NVIDIA GeForce RTX 3070 GPU mem 8 GB 32 GB ",2023-07-23T13:07:03Z,llama,https://github.com/meta-llama/llama/issues/516 515,1817079437,Remove redundant `.float()` in Attention,"The in the Attention layer is not necessary because internally casts everything to float32, then operates on it, then casts back down to whatever precision you're operating in. Proof: Removing the meant I could get a slightly higher batch size without OOMing.",2023-07-23T10:42:03Z,llama,https://github.com/meta-llama/llama/pull/515 514,1817077353,"Download.sh not working ""no such file or directory"" FIRST SOLUTION BELOW IN THE COMMENTS, IN A FEW DAYS I'LL MAKE THE TUTORIAL","After I understand the debugger to use I tried to run it but it gives to me the error ""the directory can be found"",at the moment I don't have in front of me the PC, after I'll write the correct literally error, can anyone help me?",2023-07-23T10:34:29Z,llama,https://github.com/meta-llama/llama/issues/514 513,1817076956,Update download.sh,Do not have the right to copy paste my link to be able to download the software,2023-07-23T10:32:55Z,llama,https://github.com/meta-llama/llama/pull/513 511,1817057359,Update MODEL_CARD.md, ,2023-07-23T09:28:10Z,llama,https://github.com/meta-llama/llama/pull/511 509,1817041694,download.sh not working,"Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 70B-chat download.sh: 12: [[: not found Downloading LICENSE and Acceptable Usage Policy download.sh: 17: Bad substitution",2023-07-23T08:35:57Z,llama,https://github.com/meta-llama/llama/issues/509 508,1817023518,"how to train large model llama-2-13B-hf for token 4k ,lora by fairscale distributed tensors?","how to train a large model llama-2-13B-hf 4k tokens,lora by distributed tensors metrogon-lm or fairscale? now if I set token max_lenth=4096.it raise cuda out of memory?how to solve it? per device batch size is 1.",2023-07-23T07:30:33Z,llama,https://github.com/meta-llama/llama/issues/508 506,1816962759,Slow inference and poor performance compared to Google Flan-UL2,"I have successfully run the 7b-chat model on my RTX-4070, but I am surprised at how long it takes to generate responses. I have tested it using a set of feature extraction tasks (I feed it a conversation transcript and ask it to answer True or False whether the conversation includes a given feature (EG: a complaint)). Google's Flan-UL2 model has 20B parameters, and is able to answer most questions in under 10 seconds (with 98% accuracy), but llama-7b-chat is taking 60+ seconds per question, and is scoring less than 15% accuracy. The poor accuracy could be attributed to the parameter count disadvantage (I haven't been able to test the 13b model as I only have 1 GPU), but I am very surprised by the slow inference time. Does anybody know what could be causing this? Code below. ",2023-07-23T02:34:58Z,llama,https://github.com/meta-llama/llama/issues/506 505,1816933617,OpenAI API like functions,How would I go about creating something similar to the OpenAI API's chat functions with Llama 2?,2023-07-23T00:15:54Z,llama,https://github.com/meta-llama/llama/issues/505 504,1816891464,Partial support of Apple M1/M2 (via CPU mode),"example of the run: ",2023-07-22T20:55:10Z,llama,https://github.com/meta-llama/llama/pull/504 502,1816838692,how can I speed up the inference process?,"- GPU: **RTX4090** - 7B-chat is load through Huggingface LlamaForCausalLM the hyperparameters listed below the time costs more than 20 seconds, is there any method the speed up the inferences process? ",2023-07-22T17:25:27Z,llama,https://github.com/meta-llama/llama/issues/502 501,1816767193,Unable to use the model - trying sentiment analysis,"I downloaded the model. I gave the location where it is saved but it doesnt run. It asks for a config.json if running from transformers, and asking for model file when running from local. However, there are different parts of model downloaded - how can I give the model file name? Representative code below - I have tried multiple iterations with changing it to local,. etc. import torch from transformers import AutoModelForSeq2SeqLM, AutoTokenizer # Load model and tokenizer model_path = model = AutoModelForSeq2SeqLM.from_pretrained(model_path) tokenizer = AutoTokenizer.from_pretrained(model_path) # Define sentiment analysis prompt text = ""I really enjoyed that movie!"" prompt = f""Sentiment: {text}"" # Encode prompt inputs = tokenizer(prompt, return_tensors=""pt"") # Generate output outputs = model.generate(**inputs) output = tokenizer.decode(outputs[0], skip_special_tokens=True) # Print sentiment print(output)",2023-07-22T13:32:56Z,llama,https://github.com/meta-llama/llama/issues/501 499,1816758069,Can't run without a GPU.,"When I try to run 7B-chat without a GPU its says this: RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!",2023-07-22T13:01:44Z,llama,https://github.com/meta-llama/llama/issues/499 498,1816750281,Can't download models on windows.,"Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: Downloading LICENSE and Acceptable Usage Policy SYSTEM_WGETRC = syswgetrc = Files --2023-07-22 07 25-- Resolving download.llamameta.net... 18.160.96.18, 18.160.96.14, 18.160.96.40, ... Connecting to download.llamameta.net|18.160.96.18|:443... connected. OpenSSL: error SSL routines reason(1000) Unable to establish SSL connection.",2023-07-22T12:33:58Z,llama,https://github.com/meta-llama/llama/issues/498 497,1816719314,Redirects are currently not supported in Windows or MacOs.,"input: torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 outpuy: Redirects are currently not supported in Windows or MacOs. [W [c10d] The client socket has failed to connect to [fw]:29500 (system error: 10049 - 在其上下文中,该请求的地址无效。). [W [c10d] The client socket has failed to connect to [fw]:29500 (system error: 10049 - 在其上下文中,该请求的地址无效。). [W [c10d] The client socket has failed to connect to [fw]:29500 (system error: 10049 - 在其上下文中,该请求的地址无效。). [W [c10d] The client socket has failed to connect to [fw]:29500 (system error: 10049 - 在其上下文中,该请求的地址无效。). Traceback (most recent call last): File line 55, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 18, in main generator = Llama.build( File line 62, in build torch.distributed.init_process_group(""nccl"") File line 907, in init_process_group default_pg = _new_process_group_helper( File line 1013, in _new_process_group_helper raise RuntimeError(""Distributed package doesn't have NCCL "" ""built in"") RuntimeError: Distributed package doesn't have NCCL built in ERROR failed (exitcode: 1) local_rank: 0 (pid: 12188) of binary: Traceback (most recent call last): File line 196, in _run_module_as_main return _run_code(code, main_globals, None, File line 86, in _run_code exec(code, run_globals) File line 7, in File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: rank : 0 (local_rank: 0) exitcode : 1 error_file: traceback : To enable traceback see: ============================================================ ",2023-07-22T10:45:36Z,llama,https://github.com/meta-llama/llama/issues/497 496,1816716035,Running llama-2-7b timeout in Google Colab,"Here is the Gist: As you can see, after installing Pytorch and run the example command, it runs for 3:30 and the child process is stopped. GPU version is attached in Gist for reference. Is it the memory problem? Or any other insight is appreciated. Thank you very much in advance for FBR great work.",2023-07-22T10:33:05Z,llama,https://github.com/meta-llama/llama/issues/496 495,1816712984,I feel like vicuna is better than llama v2-chat when doing math task,"When performing math tasks, such as: Problem: Given = 2, what is the value of A) 0 B) 1 C) 2 D) 4 I feel that Vicuna 13B performs better than Llama-v2-chat-13b when doing math. Additionally, I've observed that the answers provided by Llama-v2-chat-13b are more consistent across multiple runs, and it seems more inclined to state that a problem is unsolvable. Is this just my imagination? ",2023-07-22T10:21:48Z,llama,https://github.com/meta-llama/llama/issues/495 494,1816645279,Mask is a square matrix but scores might not always be a square matrix,"Hi all, thank you very much for sharing the code and the great work, I'm looking at this piece of code in model.py: Ignoring the batch size and n_local_heads dimensions, the scores matrix's dimension is , where . The mask matrix is a square matrix of size (this piece of code) If the scores matrix is also a square matrix of size seqlen (this is when start_pos=0), or if (in this case we have no mask), these 2 matrices can be added together. However, if and , wouldn't the score matrix's dimensions not match the mask matrix's dimensions? ",2023-07-22T06:29:46Z,llama,https://github.com/meta-llama/llama/issues/494 493,1816640298,download.sh: line 2: $'\\r': command not found,"run download.sh by cygwin in windows but it give back ""download.sh: line 2: command not found"" ",2023-07-22T06:09:46Z,llama,https://github.com/meta-llama/llama/issues/493 491,1816613948,Format messages for chat completion,"Hello, thank you for your excellent work. As a newcomer, I am curious about why the system message's content is prepended to the first user message. This leads to the following type of prompt: why not to format the message in this format? referenced code snippet: source: ",2023-07-22T05:05:53Z,llama,https://github.com/meta-llama/llama/issues/491 490,1816386100,Remove linkshim workaround from README,"As far as I know, this shouldn't be an issue any more. (see Meta-internal Workplace post: ",2023-07-21T21:09:00Z,llama,https://github.com/meta-llama/llama/pull/490 489,1816341193,70B Model is Using 200gb of VRAM,"I am having trouble running inference on the 70b model as it is using additional CPU memory, possibly creating a bottleneck in performance. It is unable to load all 70b weights onto 8 V100 GPUs. How can I make sure it is only running on the GPU is there any way to reduce the memory usage so that I can comfortably run inference on the 8 GPUs? It goes extremely slow because the last layers (below) are running on CPU. I am using the following code: When I query : ` {'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 1, 'model.layers.9': 1, 'model.layers.10': 1, 'model.layers.11': 1, 'model.layers.12': 1, 'model.layers.13': 1, 'model.layers.14': 1, 'model.layers.15': 1, 'model.layers.16': 1, 'model.layers.17': 2, 'model.layers.18': 2, 'model.layers.19': 2, 'model.layers.20': 2, 'model.layers.21': 2, 'model.layers.22': 2, 'model.layers.23': 2, 'model.layers.24': 2, 'model.layers.25': 2, 'model.layers.26': 3, 'model.layers.27': 3, 'model.layers.28': 3, 'model.layers.29': 3, 'model.layers.30': 3, 'model.layers.31': 3, 'model.layers.32': 3, 'model.layers.33': 3, 'model.layers.34': 3, 'model.layers.35': 4, 'model.layers.36': 4, 'model.layers.37': 4, 'model.layers.38': 4, 'model.layers.39': 4, 'model.layers.40': 4, 'model.layers.41': 4, 'model.layers.42': 4, 'model.layers.43': 4, 'model.layers.44': 5, 'model.layers.45': 5, 'model.layers.46': 5, 'model.layers.47': 5, 'model.layers.48': 5, 'model.layers.49': 5, 'model.layers.50': 5, 'model.layers.51': 5, 'model.layers.52': 5, 'model.layers.53': 6, 'model.layers.54': 6, 'model.layers.55': 6, 'model.layers.56': 6, 'model.layers.57': 6, 'model.layers.58': 6, 'model.layers.59': 6, 'model.layers.60': 6, 'model.layers.61': 6, 'model.layers.62': 7, 'model.layers.63': 7, 'model.layers.64': 7, 'model.layers.65': 7, 'model.layers.66': 7, 'model.layers.67': 7, 'model.layers.68': 7, 'model.layers.69': 7, 'model.layers.70': 7, 'model.layers.71': 'cpu', 'model.layers.72': 'cpu', 'model.layers.73': 'cpu', 'model.layers.74': 'cpu', 'model.layers.75': 'cpu', 'model.layers.76': 'cpu', 'model.layers.77': 'cpu', 'model.layers.78': 'cpu', 'model.layers.79': 'cpu', 'model.norm': 'cpu', 'lm_head': 'cpu'} `",2023-07-21T20:20:11Z,llama,https://github.com/meta-llama/llama/issues/489 488,1816198623,Could not install,"Hello I got the following problem during installation How can I solve this problem ? ",2023-07-21T18:12:28Z,llama,https://github.com/meta-llama/llama/issues/488 486,1816054983,Inconsistent Usage of .forward and PyTorch's nn.Module __call__,"The codebase exhibits inconsistent use of the methods and PyTorch's . Both methods are employed interchangeably in different modules and scripts which is an inconsistent coding style. To enhance code clarity and maintainability, I believe it is a good practice to choose one method ( or ) and apply it consistently throughout the entire codebase. Identifying all instances where both methods are used and replacing the non-preferred method with the chosen one will improve readability and facilitate collaboration among developers. For example, in model.py on line 239 in the forward method of the class, is used in the sub-layers, however in the class, is used consistently for the sub-layers.",2023-07-21T16:07:46Z,llama,https://github.com/meta-llama/llama/issues/486 485,1816049499,Update generation.py, ,2023-07-21T16:03:22Z,llama,https://github.com/meta-llama/llama/pull/485 484,1816034596,Use of [INST] for chat completions,"I see that INST is used to wrap assistant and user content in chat completions. (Side note: I was thinking it might be in vocab, but see it's not). I'm trying to fine-tune llama-2- 7b-chat for function calling and it is responding with multiple turns (and not stopping at the I think this is an artifact for me incorrectly wrapping with and [INST], which is causing the model to respond using [INST] tokens as well. Here is one sample: Here is another sample: And the code to generate: `def generate(index): system_prompt = data['test'][index]['systemPrompt'] user_prompt = data['test'][index]['userPrompt'] correct_answer = data['test'][index]['assistantResponse'] B_INST, E_INST = ""[INST]"", B_SYS, E_SYS = # Define the roles and their corresponding prompts SYSTEM_ROLE = ""system"" USER_ROLE = ""user"" ASSISTANT_ROLE = ""assistant"" # Define your prompt template with the roles dialog = [ {""role"": SYSTEM_ROLE, ""content"": system_prompt.strip()}, {""role"": USER_ROLE, ""content"": user_prompt.strip()}, ] # Transform dialog into a format compatible with Llama2 dialog_transformed = [ { ""role"": dialog[1][""role""], ""content"": f""{B_SYS}{dialog[0]['content']}{E_SYS}{B_INST}{dialog[1]['content']}{E_INST}"", } ] # Concatenate the 'content' of the messages, maintaining the role sequence prompt = """".join([entry['content'] for entry in dialog_transformed]) print(""Prompt:"") print(prompt) encoding = tokenizer(prompt, return_tensors=""pt"").to(""cuda:0"") output = model.generate(input_ids=encoding.input_ids, attention_mask=encoding.attention_mask, max_new_tokens=200, do_sample=True, temperature=0.01, eos_token_id=tokenizer.eos_token_id, top_k = 0) print() # Subtract the length of input_ids from output to get only the model's response output_text = tokenizer.decode(output[0, len(encoding.input_ids[0]):], skip_special_tokens=True) output_text = output_text) # remove excessive newline characters print(""Generated Assistant Response:"") print(output_text) print() print(""Correct Assistant Response:"") print(correct_answer) print() ",2023-07-21T15:51:55Z,llama,https://github.com/meta-llama/llama/issues/484 483,1816011843,Fine-tunning or continue pre-train in other languages,"Llama2 is open for commercial use. However, the hugginface model card states that its use in other languages is out of scope ( and If we do a fine-tunning or continue the pre-training using datasets in pt-BR would we be infringing any license rules? I work at Brasilian government and we intend continue the pre-train of llama2 with a lot more data in pt-Br to use it as baseline for several task specific future fine-tunnings. We want to open source this portuguese fluent baseline on huggingface. However we are concerned about licence issues. If you can not answer my question, can you suggest any comunication channel to meta so we can clear this doubt?",2023-07-21T15:35:35Z,llama,https://github.com/meta-llama/llama/issues/483 482,1815987960,torch.distributed.elastic.multiprocessing.errors.ChildFailedError:,"I downloaded the llama-2-7b and run the command as they metioned but got this error ",2023-07-21T15:19:58Z,llama,https://github.com/meta-llama/llama/issues/482 481,1815797930,Better documention on the chat text format,"Hi, Right now the project only briefly mentions the format for the chat completion in the README.md file. > The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, a specific formatting defined in chat_completion needs to be followed, including the INST and <> tags, BOS and EOS tokens, and the whitespaces and breaklines in between (we recommend calling strip() on inputs to avoid double-spaces). However, the code in does not give any examples or explanation on the exact format is being used, and the code is not very clear either. I think a better documentation on how exactly the prompts are formatted before we apply tokenization might be helpful. At least add some examples would be great. **Update:** Here are some examples of the chat text format. Case 1: Prompt ends at 1st user prompt, not answer yet: Case 2: Prompt ends at 2nd user prompt, has 1st answer: Based on these observation, we might create the training data using this format: ",2023-07-21T13:18:30Z,llama,https://github.com/meta-llama/llama/issues/481 480,1815559768,Minimum hardware Requirements to run the models locally?,"what are the minimum hardware requirements to run the models on a local machine ? ### Requirements - CPU : - GPU: - Ram: ### For All models. - Llama2 7B - Llama2 7B-chat - Llama2 13B - Llama2 13B-chat - Llama2 70B - Llama2 70B-chat ",2023-07-21T10:22:19Z,llama,https://github.com/meta-llama/llama/issues/480 479,1815479966,dbgpt already supports llama2,you can run llama2 locally use dbgpt.,"In our project DB-GPT. Supports multiple large language models, currently supporting Vicuna (7b, 13b), ChatGLM-6b (int4, int8), guanaco(7b,13b,33b), Gorilla(7b,13b), 🔥 llama-2(7b, 13b, 70b) We're building some really interesting applications around databases and large language models. ",2023-07-21T09:28:20Z,llama,https://github.com/meta-llama/llama/issues/479 478,1815443646,Loading model from a local folder gives Cannot copy out of meta tensor; no data error,"Hi All, I was successful in running the when i downloaded the model from huggingface. However, it is failing when i try to load the model from a given folder i.e. i have saved the model using the save_pretrained method ` code that i have written to load the model Model load is failing. I am running the code on a GPU machine. **The code is working when i load the model from the hugging face default repository**. Do let me know if any more information is required",2023-07-21T09:06:26Z,llama,https://github.com/meta-llama/llama/issues/478 476,1815407422,Bash error in download.sh,"I'm getting the following error: However looking at the code it seems to be ok. I'm executing this on a WSL2 on Windows 10. Thanks.",2023-07-21T08:41:00Z,llama,https://github.com/meta-llama/llama/issues/476 475,1815282471,Windows10 download.sh error,"When I run the download.sh file and enter the information as prompted, git bash emits an error message `OpenSSL: error SSL routines reason(1000) Unable to establish SSL connection`",2023-07-21T07:15:37Z,llama,https://github.com/meta-llama/llama/issues/475 474,1815261936,Port `stable` branch LLaMA optimizations to LLaMA2,Port branch LLaMA optimizations to LLaMA2,2023-07-21T06:58:08Z,llama,https://github.com/meta-llama/llama/pull/474 473,1815248479,feat: add `--continue` for wget download,the same as but add more ,2023-07-21T06:46:17Z,llama,https://github.com/meta-llama/llama/pull/473 472,1815247744,Extremely slow text generation on Macbook Air 2020 M1,"First time trying this in text generation web ui. Any insights on why it might be slow? Macbook M1 2020 using text generation webui python3 server.py --listen --trust-remote-code --cpu-memory 8 --gpu-memory 8 --extensions openai --loader llamacpp --model TheBloke_Llama-2-13B-chat-GGML --notebook 2023-07-21 06 08 WARNING:trust_remote_code is enabled. This is dangerous. UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn(""The installed version of bitsandbytes was compiled without GPU support. "" 'NoneType' object has no attribute 'cadam32bit_grad_fp32' 2023-07-21 06 09 INFO:Loading TheBloke_Llama-2-13B-chat-GGML... 2023-07-21 06 09 INFO:llama.cpp weights detected: 2023-07-21 06 09 INFO:Cache capacity is 0 bytes llama.cpp: loading model from llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 2048 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: freq_base = 10000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 0.09 MB llama_model_load_internal: mem required = 8953.71 MB (+ 1608.00 MB per state) llama_new_context_with_model: kv self size = 1600.00 MB AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 2023-07-21 06 09 INFO:Loaded the model in 0.17 seconds. 2023-07-21 06 09 INFO:Loading the extension ""openai""... Starting OpenAI compatible api: OPENAI_API_BASE=http Running on local URL: http 7860 To create a public link, set in .",2023-07-21T06:45:34Z,llama,https://github.com/meta-llama/llama/issues/472 471,1815220502,the latency of llama2 70B larger than llama1 65B,"Hi in my test i use 8X A100 80 GB use llama1 code llama1 65B weight in my test case, one token latency about 68 -- 70 ms, but in the same test case ,use llama2 code,llama2 70B weight,one token latency about 70--73ms,large than llama1 65B. BUT llama2 70b use GQA to improve inference,I want to know ""which one should run faster in same device,llama1 65B or llama2 70B,""",2023-07-21T06:18:43Z,llama,https://github.com/meta-llama/llama/issues/471 470,1815215087,Finutune LLAMA 2 for large tables having 99columns,I am trying to finetune large tables having 99 columns and 180 rows for complex sql queries. I am unable to finetune it as it has 6000 tokens. Can we do that using LLAMA2?. Please assist.,2023-07-21T06:13:36Z,llama,https://github.com/meta-llama/llama/issues/470 469,1815205103,Workaround for `view_as_complex` and complex number multiplication,Originally suggested here: ,2023-07-21T06:02:57Z,llama,https://github.com/meta-llama/llama/pull/469 468,1815140732,"llama-2-70B-chat cannot inference again, multi-gpu volatile all 100%","I want to make a web service from 70B-chat model, but there are some bugs or errors. here is launch shell: here is code: First inference after model built is success, but when i begin to inference second time with The request is success and enter the generate progress, but the volatile of 8 gpu all up to 100% immediately, and there is no any return after waiting long time. after 1800s, processes are be shutdown: I guess it is deadlock on parallel computing. But I cant fix it. Or there is any reliable web service code of 70B-chat-model?",2023-07-21T04:38:19Z,llama,https://github.com/meta-llama/llama/issues/468 467,1815090813,Update download.sh to check for wget,The script fails if you do not have wget installed. Fail early with a nice instruction if that is the case.,2023-07-21T03:19:56Z,llama,https://github.com/meta-llama/llama/pull/467 466,1814923065,with RTX 4070 12 GB it is giving me CUDA out of memory error,"I am trying to understand what am I doing wrong here? Is it true that even smallest size of any llama2 model is 13 Gig ? And that is the reason it is not working in my 12 Gig 4070 Nvidia GPU? Is there any any workaround? Here is the error I am receiving. `idea torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Traceback (most recent call last): File line 55, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 18, in main generator = Llama.build( File line 96, in build model = Transformer(model_args) File line 259, in __init__ self.layers.append(TransformerBlock(layer_id, params)) File line 222, in __init__ self.feed_forward = FeedForward( File line 207, in __init__ self.w3 = ColumnParallelLinear( File line 262, in __init__ self.weight = Parameter(torch.Tensor(self.output_size_per_partition, self.in_features)) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 11.72 GiB total capacity; 10.93 GiB already allocated; 59.19 MiB free; 10.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR failed (exitcode: 1) local_rank: 0 (pid: 330097) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ` ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-07-20_16 32 host : myidea rank : 0 (local_rank: 0) exitcode : 1 (pid: 330097) error_file: traceback : To enable traceback see: ============================================================ ",2023-07-20T23:11:30Z,llama,https://github.com/meta-llama/llama/issues/466 465,1814888554,download.sh: add `--no-config` to all wget calls,"This addresses an issue where wget would use the user's file, which could cause problems if the user had configured wget to use certain user-agents. In general, ignoring the user's config seems like a good idea. Issue I ran into specifically is here: ",2023-07-20T22:33:56Z,llama,https://github.com/meta-llama/llama/pull/465 464,1814879089,How to disable the ethical block in the model?,"How can I disable this ethical nonsense? Extremally disappointed with this Llama2 shitshow. It waisted me 2 days in trying to download the model files, and after I run it, this is what I get. ",2023-07-20T22:27:14Z,llama,https://github.com/meta-llama/llama/issues/464 463,1814671611,OpenAI-like function calling,"Hello guys! Thank you for great job! I'm currently using OpenAI chat completions with function calling but OpenAI main models don't support fine-tuning yet but LLaMA does. Unfortunately Langchain Agents don't provide high quality of results and I'm hoping to find something that executes functions similar to OpenAI function calling (arguments, required, enum) with possibility to fine-tune. Have somebody tried to do that with LLaMA 2 (some hidden trick, some way to train to call functions etc)? Thank you!",2023-07-20T19:28:13Z,llama,https://github.com/meta-llama/llama/issues/463 461,1814467053,Running example got error: torch.distributed.elastic.multiprocessing.api:failed,"I'm testing with the example in README `torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4` And got this error: ` >initializing model parallel with size 1 >initializing ddp with size 1 >initializing pipeline with size 1 **ERROR failed (exitcode: -9) local_rank: 0 (pid: 2667) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: =========================================================== example_text_completion.py FAILED ----------------------------------------------------------- Failures: ----------------------------------------------------------- Root Cause (first observed failure): [0]: time : 2023-07-20_16 26 host : testvm.us-east4-c.c.xxx.internal rank : 0 (local_rank: 0) exitcode : -9 (pid: 2667) error_file: traceback : Signal 9 (SIGKILL) received by PID 2667 ===========================================================` I'm running this on a GCP VM with 1 GPU. The VM configuration: And memory info: Please advise on how to proceed. ",2023-07-20T17:20:22Z,llama,https://github.com/meta-llama/llama/issues/461 460,1814436572,Shell Command in README.md example,"Hi, I was trying to run the pre-trained & fine-tuned examples in the README. During this, I noticed that the code block was missing an '!'. I'd request the admins update the code block to reflect that it is a shell command rather than Python code. It's a minor issue, but it can trip people up, especially beginners like me. In summary, this: `torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 4` Should be this: `!torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 4` I hope this was helpful. Thank You. Regards, Anubhav Shankar",2023-07-20T16:59:27Z,llama,https://github.com/meta-llama/llama/issues/460 459,1814142240,Llama, ,2023-07-20T14:34:40Z,llama,https://github.com/meta-llama/llama/issues/459 458,1814081962,How to run download.sh on a MacBook Pro (M1)?,What do I have to do to run download.sh on my MBP (M1)?,2023-07-20T14:07:05Z,llama,https://github.com/meta-llama/llama/issues/458 456,1813901416,Unable to load model from absolute path,"Created a base image with all supporting Llama requirements. Now mounted models as volume to this image, but model load to failed. Then baked the model inside container in the same directory as the executing python script, it worked. So essentially when tried this : Failed: Llama.build( tokenizer_path=tokenizer_path, max_seq_len=max_seq_len, max_batch_size=max_batch_size, ) Success: Llama.build( ckpt_dir=""llaam2-7b"", tokenizer_path=tokenizer_path, max_seq_len=max_seq_len, max_batch_size=max_batch_size, )",2023-07-20T12:35:39Z,llama,https://github.com/meta-llama/llama/issues/456 455,1813891850,"Out-of-memory for 7B on Ubuntu native, runs fine in WSL2 Ubuntu on same machine, RTX 3060 12 GB, 32 GB RAM","I've been able to run the 7B examples on my PC with 32 GB RAM and nVidia RTX 3060 12 GB. I have dual-boot Ubuntu and installed the Llama git & 2-7b model. When I run it using the example command, I get an out-of-memory error. What can I do to resolve this? I get this error even if I change the batch size to 2. nvidia shows a spike in GPU memory just before the error. `torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 11.75 GiB total capacity; 10.97 GiB already allocated; 120.06 MiB free; 10.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF `",2023-07-20T12:30:16Z,llama,https://github.com/meta-llama/llama/issues/455 454,1813815266,How to run 13B & 70B model? Windows 11 WSL2 single GPU,"thanks for Readme.md I can run example text& chat successfully by 2B model but I couldn't by 13B & 70B How to run them? example code in readme is below In 13B model, MP need 2 so I changed to --nproc_per_node 1 to --nproc_per_node 2 I got this error I think this error is caused because I only use single GPU despite 13B model need 2 GPU I don't have any more GPU so I want to run 13B model in single GPU Is there any best practice to solve this problem? thanks my device settings is below OS:Windows 11 ,Code running on WSL2 GPU:RTX 4070Ti ",2023-07-20T11:50:43Z,llama,https://github.com/meta-llama/llama/issues/454 453,1813796909,why i can not load model from llama-2-7b," ",2023-07-20T11:41:00Z,llama,https://github.com/meta-llama/llama/issues/453 452,1813697487,The client socket has failed to connect,"I have followed all the steps in the repo, yet facing this issue- ""The client socket has failed to connect""",2023-07-20T10:48:30Z,llama,https://github.com/meta-llama/llama/issues/452 451,1813629741,Update download.sh to not use hardcoded bash path for improved portability,"Bash won't always be available at , (e.g. on NixOS), but is more portable and will generally work on such systems",2023-07-20T10:08:44Z,llama,https://github.com/meta-llama/llama/pull/451 450,1813624841,"Cannot set parameters ""max_length"",""max total tokens"" or ""max_input_length"" for meta-llama/Llama-2-7b-chat-hf","### System Info ## 1. I deployed in a VPC according to these parameters: ## 2. Then I call my endpoint to predict. I used many different combinations of these parameters: ## 3. Results #### 3.1 If I specify no ""max_new_tokens"" but try to increase ""max_length"", ""max_input_length"", or ""max_total_tokens"", I recieve this error message if my input exceeds 1000 tokens: ` botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (422) from primary with message ""{""error"":""Input validation error: must have less than 1000 tokens. Given: 1413"",""error_type"":""validation""}""` #### 3.2 If I set max_new_tokens I recieve this error message: inputs max_new_tokens inputs max_new_tokens #### 3.3 If I have a prompt smaller than 1000 tokens it works fine. ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own modifications ### Reproduction 1. Deploy HF model in VPC based without internet 2. Call endpoint to predict with parameters ### Expected behavior I expect the endpoint to set the parameters related to max_length according to my specifications.",2023-07-20T10:05:58Z,llama,https://github.com/meta-llama/llama/issues/450 448,1813616019,"About ""HTTPError: 404 Client Error"" and ""OSError: meta-llama/Llama-2-7b does not appear to have a file named config.json"".","I encountered those errors when I was downloading Llama-2-7b from huggingface. I have full permission for using Llama-2 models and also did .",2023-07-20T10:00:48Z,llama,https://github.com/meta-llama/llama/issues/448 447,1813615598,"uccessfully installed llama-0.0.1, but can not import module?? why? ",">>> import sys >>> for path in sys.path: ... print(path) ... >>> exit() ResourceWarning: Implicitly cleaning up >> import llama Traceback (most recent call last): File """", line 1, in ModuleNotFoundError: No module named 'llama' >>> ",2023-07-20T10:00:32Z,llama,https://github.com/meta-llama/llama/issues/447 446,1813577896,llama2-7b's tokenizer length doesn't match embedding size,the length of tokenizer is 32001 while the embedding size is 32000 * 4096.,2023-07-20T09:39:36Z,llama,https://github.com/meta-llama/llama/issues/446 445,1813496130,Checksum script, ,2023-07-20T08:53:02Z,llama,https://github.com/meta-llama/llama/pull/445 444,1813478291,how to finetun llama 2-7B in lora or p-tuning from another datasets, ,2023-07-20T08:43:10Z,llama,https://github.com/meta-llama/llama/issues/444 442,1813408966,feat(Download.ps1): Add download.ps1 for Windows,"I wanted to share a helpful reference with you for downloading files on Windows OS without having to install wget. However, due to a download limit on the link provided, I was unable to fully test the script. I did successfully download 7B and 7B-chat on my Windows device though.",2023-07-20T08:02:51Z,llama,https://github.com/meta-llama/llama/pull/442 441,1813363684,[download.sh] make downloads resumable for large files,wget --continue can resume downloads if the script was interrupted,2023-07-20T07:37:22Z,llama,https://github.com/meta-llama/llama/pull/441 440,1813357520,"Don't call ""open source"" what isn't?","Meta is muddying the waters using a term that has an industry-wide recognized definition. The Llama2 license is simply not open source, nor free software: ",2023-07-20T07:33:18Z,llama,https://github.com/meta-llama/llama/issues/440 439,1813342450,Enter the URL from email,"This is a form to enable access to Llama 2 on Hugging Face after you have been granted access from Meta. Please visit the Meta website and accept our license terms and acceptable use policy before submitting this form. Requests will be processed in 1-2 days. 根本打不开,有没有大佬给个url, 借用下载一下",2023-07-20T07:23:55Z,llama,https://github.com/meta-llama/llama/issues/439 438,1813337351,Error while downloading Models using download.sh,"hi, I am running download.sh in Windows 11Home OS on Git bash. I am getting scheme missing error. Earlier I was getting wget and md5sum , which are resolved. For scheme missing error this is how the error looks: < : Scheme missing. Checking checksums md5sum: checklist.chk: no properly formatted MD5 checksum lines found Downloading llama-2-13b https Scheme missing. > I tried with downloading 1 model and all models. ",2023-07-20T07:20:29Z,llama,https://github.com/meta-llama/llama/issues/438 437,1813320417,RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())],RuntimeError: Internal: [model_proto->ParseFromArray(serialized.data(), serialized.size())],2023-07-20T07:09:55Z,llama,https://github.com/meta-llama/llama/issues/437 436,1813297427,"RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found?","What is the reason behind and how to fix the error: ? I'm trying to run with: And using: But I'm getting this RuntimeError, Help!",2023-07-20T06:55:08Z,llama,https://github.com/meta-llama/llama/issues/436 435,1813234275,how to use llama2 for instruction?,"Can i use llama2 to train alpaca,vicuna,orca instruction with lora?do I need to change another prompt to do instruction?which is better?",2023-07-20T06:13:29Z,llama,https://github.com/meta-llama/llama/issues/435 434,1813048053,How to Run 70B with 4*A100-80g,"I'd like to try 70B with 4 A100-80g, however, the weights only support 8 mp. How can I convert the 8 mp way weights into 4 mp way?",2023-07-20T02:51:11Z,llama,https://github.com/meta-llama/llama/issues/434 433,1812879396,Unable to run example program - example_text_completion.py,"Unable to run the following command torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 I am running it on MacBook Pro with following configuration. ",2023-07-19T23:29:05Z,llama,https://github.com/meta-llama/llama/issues/433 431,1812851813,How To Run This Modle Localy ,"Any One Help Me To Run This Code On Local without torch run ",2023-07-19T22:59:22Z,llama,https://github.com/meta-llama/llama/issues/431 430,1812797582,Error running `example_chat_completion.py` on `llama-2-7b-chat`,"python 3.8 PyPi running on a nvidia rtx 3900 ",2023-07-19T22:05:43Z,llama,https://github.com/meta-llama/llama/issues/430 429,1812678260,Need help downloading the model files,"I have already tried 6 links, today an yesterday, one of the links worked, but only for a short time and it was able to download only some of the files before going dead. Since then all the links that I try, return the same response: ",2023-07-19T20:25:37Z,llama,https://github.com/meta-llama/llama/issues/429 428,1812674549,failed run to CPU," ",2023-07-19T20:22:44Z,llama,https://github.com/meta-llama/llama/issues/428 426,1812628839,Unable to establish SSL connection,"I downloaded the ckpts successfully yesterday using the given URL in the email, but it is returning SSL connection error today. ",2023-07-19T19:48:45Z,llama,https://github.com/meta-llama/llama/issues/426 425,1812608202,Hardware requirements for Llama 2,"Similar to #79, but for Llama 2. Post your hardware setup and what model you managed to run on it.",2023-07-19T19:33:45Z,llama,https://github.com/meta-llama/llama/issues/425 424,1812474590,Any possibility of getting Llama2 to run on Transformers/AutoTokenizers?,"The current file example uses TorchRun. It would be great if it use an approach more like Falcon, etc. using transformers and AutoTokenizers - when I try, I get a plethera of errors. :-( Something like: ",2023-07-19T18:15:25Z,llama,https://github.com/meta-llama/llama/issues/424 423,1812387884,70B chat wrong shape?,"Running 70B-chat using HuggingFace (4.31.0), I get: RuntimeError: mat1 and mat2 shapes cannot be multiplied (52x8192 and 1x1024) 7 and 13 run fine. Any ideas? (I'm loading in 4bit to fit on my pair of 3090s, < 20GB used each, so model does load.)",2023-07-19T17:26:25Z,llama,https://github.com/meta-llama/llama/issues/423 422,1812379023,Python download script for macos users,I had issues with installing wget on my mac and decided to write a python version. For those who have similar issues.,2023-07-19T17:20:01Z,llama,https://github.com/meta-llama/llama/pull/422 421,1812365145,Added python version of the download script for Mac Users,"I tried to use the download.sh but it required wget, which in turn required brew on Mac. I hope this script will help also others too.",2023-07-19T17:11:17Z,llama,https://github.com/meta-llama/llama/pull/421 420,1812306899,torch.distributed.elastic.multiprocessing.errors.ChildFailedError:,"Running into the same error on the 13b and 70b chat models. Using a h100 80GB card. The 7b chat model works fine. Command (13b): Error: `***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** > initializing model parallel with size 2 > initializing ddp with size 1 > initializing pipeline with size 1 Traceback (most recent call last): File ""example_chat_completion.py"", line 149, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""example_chat_completion.py"", line 20, in main generator = Llama.build( File line 69, in build torch.cuda.set_device(local_rank) File line 350, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with to enable device-side assertions. WARNING Sending process 74007 closing signal SIGTERM ERROR failed (exitcode: 1) local_rank: 1 (pid: 74008) of binary: Traceback (most recent call last): File line 11, in load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')() File line 344, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_chat_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-07-19_16 42 host : 209-20-158-162 rank : 1 (local_rank: 1) exitcode : 1 (pid: 74008) error_file: traceback : To enable traceback see: `",2023-07-19T16:34:40Z,llama,https://github.com/meta-llama/llama/issues/420 419,1812300717,"Docker LLaMA2 Chat, 3 STEPS :-D","TLDR; This project has been tested by 4090 and costs 8 ~ 14G vRAM. It's too late, get up tomorrow and continue to update, if you pass the test, you can update it in the post 🍺",2023-07-19T16:30:29Z,llama,https://github.com/meta-llama/llama/issues/419 418,1812269633,ERROR 403: Forbiden,"Hello, when I tried to download weights, it seems that there are some errors to download it. Can you help me to figure this out? I keep receiving this error message: ",2023-07-19T16:10:40Z,llama,https://github.com/meta-llama/llama/issues/418 417,1812257285,Missing file params.json,"Can someone post here the params.json file, for some reason it did not download it for me. ",2023-07-19T16:02:30Z,llama,https://github.com/meta-llama/llama/issues/417 416,1812255713,Not receiving the download link.,"I have already received 3 download links that either got 403 or got files partially Now waiting for the 4th link, but not receiving it. How much do we need to wait for the download link email? ",2023-07-19T16:01:31Z,llama,https://github.com/meta-llama/llama/issues/416 415,1812238966,AssertionError: Loading a checkpoint for MP=2 but world size is 1,"Hello,I'm trying to run llama-2-13b-chat with this command: $ torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 4 get this error: Traceback (most recent call last): File line 73, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 20, in main generator = Llama.build( File line 80, in build assert model_parallel_size == len( AssertionError: Loading a checkpoint for MP=2 but world size is 1 ERROR failed (exitcode: 1) local_rank: 0 (pid: 2219637) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: Thanks for any help!",2023-07-19T15:50:47Z,llama,https://github.com/meta-llama/llama/issues/415 414,1812197704,403 errors on consolidated-02 for 70B-chat,"download.sh is getting a 403 error for consolidated.02.pth this has happened on multiple retries .. other shards download fine --2023-07-19 10 09-- ..... Resolving download.llamameta.net (download.llamameta.net)... 18.160.225.23, 18.160.225.122, 18.160.225.113, ... Connecting to download.llamameta.net (download.llamameta.net)|18.160.225.23|:443... connected. HTTP request sent, awaiting response... 403 Forbidden. 2023-07-19 10 09 ERROR 403: Forbidden.. ",2023-07-19T15:28:10Z,llama,https://github.com/meta-llama/llama/issues/414 413,1812137515,How to run download.sh in windows 10 computer without wget,"I am trying to download the weigths for llma-2-13B-chat by running download.sh. I did chmod 755 download.sh and then It gives me error that 'wget' is not installed in my Windows 10 computer (given by office) If some other developers have faced the same issue of installing wget, any suggestions are truly appreciated. Also note that, the available versions of wget are for 32-bit versions and that also requires WSL installed in Windows and needs Linux subsystem. But, I can not install linux sub system in my office computer. Is there simpler way of installing wget in windows 10? (e.g. some pre-built binaries?) Also, if there are any other alternative ways to download the model: llama-2-13B-chat please let me know. For example, using python module: request.get(url,verify=my_certificate) Any suggestions are highly appreciated.",2023-07-19T14:56:14Z,llama,https://github.com/meta-llama/llama/issues/413 412,1812099937,Error on download.sh download.sh: 23: Syntax error:,"When running the download.sh script I am getting a syntax error ",2023-07-19T14:36:38Z,llama,https://github.com/meta-llama/llama/issues/412 411,1812094561,"""Unable to establish SSL connection"" error when running ./download.sh","Hi, I got ""Unable to establish SSL connection"" error when running in my Windows system: Can somebody help?",2023-07-19T14:33:58Z,llama,https://github.com/meta-llama/llama/issues/411 409,1812003594,"README.md is executable, `download.sh` is not","Cloning the repo, it looks like couldn't really work as it is not executable, but is (as is ), which seems like a mistake.",2023-07-19T13:52:27Z,llama,https://github.com/meta-llama/llama/issues/409 408,1811961466,"added links to 7b, 13b, 70b Chatbot demos on Spaces", ,2023-07-19T13:32:28Z,llama,https://github.com/meta-llama/llama/pull/408 407,1811909659,Error: 70B Model quantizing on mac: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192,"Used this model: Used these commands: 7B and 11B models work without any problems. This is only when using the 70B model. _error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024_ ",2023-07-19T13:07:22Z,llama,https://github.com/meta-llama/llama/issues/407 406,1811838597,Llama 2: Using in languages other than English.,Why do you have a restriction on using Llama 2 in languages other than English?,2023-07-19T12:25:55Z,llama,https://github.com/meta-llama/llama/issues/406 405,1811795343,Am not able to run 13b-chat on my Mac M1 Pro,"when. I try this code `torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 4` it show this error : any idea how to solve this ?",2023-07-19T11:59:09Z,llama,https://github.com/meta-llama/llama/issues/405 404,1811792796,Starting example_chat_completion.py on M1 Mac drops errors,"Hi there, Download and installation works great, but I got errors with examples. Here is what I did: - I created and activated a conda environment and installed necessary dependencies - pip install -e . and copy paste the example. I got this. Any idea, what I did wrong?: (llama2) $ torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 4 NOTE: Redirects are currently not supported in Windows or MacOs. Traceback (most recent call last): File line 73, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File line 20, in main generator = Llama.build( ^^^^^^^^^^^^ File line 62, in build torch.distributed.init_process_group(""nccl"") File line 907, in init_process_group default_pg = _new_process_group_helper( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 1013, in _new_process_group_helper raise RuntimeError(""Distributed package doesn't have NCCL "" ""built in"") RuntimeError: Distributed package doesn't have NCCL built in ERROR failed (exitcode: 1) local_rank: 0 (pid: 7780) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) ^^^^^^ File line 346, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_chat_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-07-19_13 49 host : host.local rank : 0 (local_rank: 0) exitcode : 1 (pid: 7780) error_file: traceback : To enable traceback see: ============================================================ Any advice is highly appreciated, thanks Nasinasi",2023-07-19T11:57:49Z,llama,https://github.com/meta-llama/llama/issues/404 402,1811626674,download.sh doesn't download weights,"self-explanatory. i'm just running download script, pasting it url from e-mail, and get only couple small config files. no shards are downloaded, no errors are reported either. ",2023-07-19T10:12:01Z,llama,https://github.com/meta-llama/llama/issues/402 401,1811574444,"About new tokens, B_INST, E_INST etc","I see that you are using these tokens for chat generation, but when I try to encode them, it looks like they are not actual tokens themselves. I am wondering if this is intended or they are supposed to be tokens individually(like B_INST itself being a separate token)",2023-07-19T09:40:17Z,llama,https://github.com/meta-llama/llama/issues/401 399,1811431065,Runtime Error while creating spaces in Huggingface using chat-hf (Could not find model),"Hi, I am getting below when I am trying to load model in Huggingface spaces. Tried with 70B and 13B hf chat models and get the same with both: Runtime error GradioDeprecationWarning: gr.Interface.load() will be deprecated. Use gr.load() instead. Fetching model from: Traceback (most recent call last): File line 3, in File line 98, in load return external.load( File line 70, in load return load_blocks_from_repo( File line 109, in load_blocks_from_repo blocks: gradio.Blocks = factory_methodssrc File line 149, in from_model response.status_code == 200 AssertionError: Could not find model: If it is a private or gated model, please provide your Hugging Face access token ( as the argument for the parameter. ",2023-07-19T08:23:17Z,llama,https://github.com/meta-llama/llama/issues/399 398,1811424895,The experimental results in the llama paper cannot be reproduced.,"Hello, thank you for your contributions to the development of the large language model. While testing Llama7b on the Openbookqa dataset, I noticed that the results differ from the ones reported in the original paper. I treated it as a text completion task, iterating through all candidate answers to find the one with the minimum loss as the correct result. However, the accuracy was only 38.4%, whereas the original paper reported 57.2%. I would like to inquire about the experimental setup used in the original paper.",2023-07-19T08:19:33Z,llama,https://github.com/meta-llama/llama/issues/398 397,1811418418,Add files via upload,llama2 http接口方式启动,2023-07-19T08:15:43Z,llama,https://github.com/meta-llama/llama/pull/397 396,1811414852,Unable to run example_chat_completion.py,"ModuleNotFoundError: No module named 'fire' ERROR failed (exitcode: 1) local_rank: 0 (pid: 2553) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_chat_completion.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Also the fire module is installed, then also its showing that its not installed. I am doing this on a aws ec2 instance",2023-07-19T08:13:31Z,llama,https://github.com/meta-llama/llama/issues/396 395,1811413723,Is it legal for llama2 to fine tune with other language?,"It's not prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2. But it's mentioned in the hugging face model card. Is it not intended but allowed? Thank you.",2023-07-19T08:12:48Z,llama,https://github.com/meta-llama/llama/issues/395 394,1811403910,No config.json in meta-llama/Llama-2-7b,"I downloaded the model from the huggingface. And I tested it with the following code: Then I got the following output: ",2023-07-19T08:07:16Z,llama,https://github.com/meta-llama/llama/issues/394 393,1811364183,Whether transformers was used?,Thank you for open sourcing these models. Have you used the open source library: transformers?,2023-07-19T07:46:30Z,llama,https://github.com/meta-llama/llama/issues/393 392,1811324336,How to run Llama-2 `example_chat_completion.py` with multi GPUs?,I have 4*T4 and I found the whole model was loaded in the 1st one. How can utilize all 4 GPUs?,2023-07-19T07:19:59Z,llama,https://github.com/meta-llama/llama/issues/392 391,1811266507,"ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set","Tried setting the value of RANK to 1 with and then was asked about WORLD_SIZE which I set to 1 as well, then MASTER_ADDR=localhost and last MASTER_PORT=12345. Now it's stuck in a loop sending another fail message: [W [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:12345 (system error: 10049 - unknown error). The way I ran was, setting up everything as asked, then editing example_chat_completion.py with the proper paths and then running it with the right conda env. I'm on windows so I had to use to run the download.sh, apart from that everything else was run on a admin cmd.",2023-07-19T06:41:36Z,llama,https://github.com/meta-llama/llama/issues/391 390,1811243031,added type hint in example code,added type hint in example for easier understanding ,2023-07-19T06:23:19Z,llama,https://github.com/meta-llama/llama/pull/390 389,1811221964,How many blocks of 80g a100 are needed to fine-tune the 70b model, ,2023-07-19T06:07:09Z,llama,https://github.com/meta-llama/llama/issues/389 388,1811153139,"download error, md5sum: checklist.chk: no properly formatted MD5 checksum lines found","error log: ",2023-07-19T05:07:02Z,llama,https://github.com/meta-llama/llama/issues/388 387,1811100597,Invalid Host Name," I have winget and md5sums installed, and even have the unique URL but still am facing this issue, while running _""bash download.sh""_. Any would be appreciative. **Thanks**",2023-07-19T04:01:41Z,llama,https://github.com/meta-llama/llama/issues/387 386,1811082946,Incorrect upload of meta-llama/Llama-2-13b-chat-hf model on Huggingface,"It seems that there might be an error in the upload of on Huggingface. The sizes do not match up. The model contains 6 checkpoints, approximately 52GB in total, whereas the model only includes 3 checkpoints, around 26GB (same as and Could you please confirm if there was an error in the upload of ",2023-07-19T03:36:42Z,llama,https://github.com/meta-llama/llama/issues/386 385,1811015981,"What is the difference between a model with the suffix ""chat"" and one without it?","As mentioned, any assistance would be greatly appreciated. Thank you very much!",2023-07-19T02:25:49Z,llama,https://github.com/meta-llama/llama/issues/385 384,1810974577,Grouped-Query Attention,"Hello Meta GenAI team (cc With regards to the 70B model, I'm currently looking into the implementation of the GQA architecture -- specifically after noticing the 8192 x 1024 layer shapes, I was trying to identify the conditional GQA parts in your reference implementation but couldn't pin it down. Given that there are some conditions that smell suspiciously GQA-related, could you please elaborate on the parts of the implementation that enable this architecture specifically for the 34B 70B models? Thanks",2023-07-19T01:27:56Z,llama,https://github.com/meta-llama/llama/issues/384 383,1810967129,Request granted yet 403 Forbidden,"I was granted llama2 model weight access (bipashabanerjee around 8 PM EST on Jul 18. I am getting 403 Forbidden when I try to download of the models. Along with 403 Forbidden, I also got the following error: checklist.chk: no properly formatted MD5 checksum lines found.",2023-07-19T01:20:05Z,llama,https://github.com/meta-llama/llama/issues/383 382,1810939123,403 error 下载模型出问题,应该是邀请链接的问题,更新一个新的邀请链接就好了,"👏👏 欢迎大家进群沟通: ",2023-07-19T00:52:23Z,llama,https://github.com/meta-llama/llama/issues/382 381,1810933702,How can We run Llama-2 in a low spec GPU? 6GB VRAM,"As many of us I don´t have a huge CPU available but I do have enogh RAM, even with it´s limitations, it´s even possible to run Llama on a small GPU? RTX 3060 with 6GB VRAM here. Of course i got the usual error: `File line 262, in __init__ self.weight = Parameter(torch.Tensor(self.output_size_per_partition, self.in_features)) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 6.00 GiB total capacity; 5.28 GiB already allocated; 0 bytes free; 5.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR failed (exitcode: 1) local_rank: 0 (pid: 9010) of binary: and i know is just the first day until we can get some documentation for this kind of situation, but probably someone did the job with Llama-1 and is not as hard as just parameters (I Hope) I only want to run the example text completion torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4 Can i use the VRAM and RAM at the same time?",2023-07-19T00:44:36Z,llama,https://github.com/meta-llama/llama/issues/381 380,1810929193,"RuntimeError: probability tensor contains either `inf`, `nan` or element < 0"," RuntimeError: probability tensor contains either , or element < 0 I got this error while doing inference for text generation, in particular when the batch size is great than 1. I did not get this error and generate correctly when the batch size is set to 1. Does anyone see the same issue? ",2023-07-19T00:38:44Z,llama,https://github.com/meta-llama/llama/issues/380 379,1810919886,New Multi-modal LLM support for LLaMA-2,"We are happy that Meta releasing such powerful LLM, and we are happy to add the integration of LLaMA-2 into our mPLUG-Owl, a modularized multi-modal large language model. ",2023-07-19T00:25:48Z,llama,https://github.com/meta-llama/llama/issues/379 378,1810911524,"Cannot load ""meta-llama/Llama-2-70b-hf"" and meta-llama/Llama-2-70b-chat-hf""","After downloading the weights of llama 2 70b from hf, I tried to load the weights using However, I got a list of errors: size mismatch for model.layers.77.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]). size mismatch for model.layers.78.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]). size mismatch for model.layers.78.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]). size mismatch for model.layers.79.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]). size mismatch for model.layers.79.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]). You may consider adding in the model method. I only have this error for 70b and 70b chat, not for smaller llama 2 models. Has everyone encountered the same error? ",2023-07-19T00:16:47Z,llama,https://github.com/meta-llama/llama/issues/378 375,1810800325,Dev, ,2023-07-18T22:17:58Z,llama,https://github.com/meta-llama/llama/pull/375 374,1810795010,can't run llama-2-7b-hf even though I'm using use_auth_token,"Error: ",2023-07-18T22:12:14Z,llama,https://github.com/meta-llama/llama/issues/374 373,1810788684,download.sh returns 403 forbidden error,"**What's Happening** When attempting to download the 70B-chat model using download.sh, the model itself returns a 403 forbidden code. **Traceback** *Note the the policy has been removed to maintain security. **Steps to Reproduce** ",2023-07-18T22:05:58Z,llama,https://github.com/meta-llama/llama/issues/373 372,1810784127,download.sh: 12: [[ Not Found (Cannot Download Models),"Hello, I recently gained access to the Llama-2 models, but every time I try to use download.sh, I get the following: My link is valid, with no extra characters as far as I can tell, so what could be going wrong here?",2023-07-18T22:01:34Z,llama,https://github.com/meta-llama/llama/issues/372 371,1810750548,"ERROR: cannot verify download.llamameta.net's certificate,", ,2023-07-18T21:30:24Z,llama,https://github.com/meta-llama/llama/issues/371 370,1810689845,download.sh closes without downloading anything.,"I've tried poddering through a few of the tips here, but the typical tips of 'make sure the URL is correct' etc haven't had any impact. It runs, I input the URL, I specify a model, it zips through a few lines of code, then closes the bash terminal without doing anything. Any assistance would be helpful. Thanks.",2023-07-18T20:45:42Z,llama,https://github.com/meta-llama/llama/issues/370 368,1810682437,RLHF versions availability,"Hi, In both email and , only non-RLHF versions are mentioned. Are the RLHF versions available from the official download? > Model weights available: > * Llama-2-7b > * Llama-2-7b-chat > * Llama-2-13b > * Llama-2-13b-chat > * Llama-2-70b > * Llama-2-70b-chat ",2023-07-18T20:40:04Z,llama,https://github.com/meta-llama/llama/issues/368 367,1810661345,Force /bin/bash in download.sh,The script fails in zsh on the if conditions.,2023-07-18T20:24:27Z,llama,https://github.com/meta-llama/llama/pull/367 366,1810660628,VRAM required for inference,"For each size of Llama 2, roughly how much VRAM is needed for inference",2023-07-18T20:23:55Z,llama,https://github.com/meta-llama/llama/issues/366 365,1810635183,Update README.md, ,2023-07-18T20:06:46Z,llama,https://github.com/meta-llama/llama/pull/365 363,1810624635,llama 2 70B-chat consolidated.04.pth causes download error,"Following the download instructions in the readme, I am able to download the 7B-chat and 13B-chat models. However, the 70B-chat model download breaks everytime at exactly this results in the message that people find at the bottom, since it tries to validate files that have not been downloaded.",2023-07-18T19:58:27Z,llama,https://github.com/meta-llama/llama/issues/363 362,1810602359,Checking checksums - Could not parse check file 'checklist.chk' (2),"the error occurs when running the download.sh script. happened to me on MacOS, Apple Sillicon",2023-07-18T19:39:25Z,llama,https://github.com/meta-llama/llama/issues/362 361,1810592558,Unable to establish SSL connection. No properly formatted MD5 checksum lines,"Connecting to download.llamaneta.net (download. connected. Unable to establish SSL connection. Checking checksums md5sum: checklist.chk: no properly formatted MD5 checksum lines",2023-07-18T19:31:17Z,llama,https://github.com/meta-llama/llama/issues/361 360,1810559046,Unable to download llama2 using pre-signed URL link,"I just get an error: I got the link today at 11:39am PST",2023-07-18T19:16:28Z,llama,https://github.com/meta-llama/llama/issues/360 359,1810548905,HuggingFace models have `max_position_embeddings` set incorrectly,"The converted HuggingFace models have set to 2048 instead of 4096 in (e.g. While this doesn't directly affect generation, it is inefficient since the embedding frequencies will be re-calculated for every token after 2048. Moreover, some uses (like Dynamic RoPE scaling) rely on this value to be the original size of the model's pre-trained context length, so it would be helpful to have it corrected in the official repos 🙂 ",2023-07-18T19:12:44Z,llama,https://github.com/meta-llama/llama/issues/359 357,1810502564,Error running download sh,"When I try to run the download script and enter the URL and model_size as requested, I got the following error Any idea how to address it?",2023-07-18T18:48:23Z,llama,https://github.com/meta-llama/llama/issues/357 356,1810461184,Convert to HF format,"I am converting the llama-2-7b-chat weights (and then the others) to huggingface format. (yes, I am impatient to wait for the one HF will host themselves in 1-2 days.) I am using the existing llama conversion script in the transformers repo: Does anyone know where to find the numbers for the llama-2 models? The number of shards for each model can be seen in the download.sh file.",2023-07-18T18:17:53Z,llama,https://github.com/meta-llama/llama/issues/356 355,1810416073,Could not parse check file 'checklist.chk' (2)," [ <=> ] 47.63K in 0.06s 2023-07-18 13 59 (767 - saved [48771] Checking checksums Could not parse check file 'checklist.chk' (2) Any solutions?",2023-07-18T17:41:57Z,llama,https://github.com/meta-llama/llama/issues/355 354,1810392484,[llama2] 403 forbidden when downloading some of the weights,"I got 403 Forbidden when downloading *some* of the weights. In the message below it successfully downloads 03 and 07 but fails on 04, 05, and 06. (The keys in the urls are omitted) ",2023-07-18T17:23:23Z,llama,https://github.com/meta-llama/llama/issues/354 353,1810340518,[llama2] checksum did NOT match,"Hi, I'm getting a warning regarding the checksum when downloading llama2. I'm wondering whether it is problem from the model weights or the checksum itself. Thanks,",2023-07-18T16:55:34Z,llama,https://github.com/meta-llama/llama/issues/353 352,1810338703,`md5sum: checklist.chk: no properly formatted MD5 checksum lines found`,"I encounter this error message when running the download script: ",2023-07-18T16:54:51Z,llama,https://github.com/meta-llama/llama/issues/352 351,1810305710,no properly formatted MD5 checksum lines found,"I get this while run download script: It this script require some certain version of md5sum ?",2023-07-18T16:38:28Z,llama,https://github.com/meta-llama/llama/issues/351 350,1810295425,download.sh redirects to http://www.facebook.com/unsupportedbrowser,"I submitted the form for approval for download and models, and received the email with the download link. When I enter this URL into and select a model to download, the script fails to download any model files. The script appears to be redirecting each file to and proceeds to download a 47.78KB file. I'm using the provided download.sh script and running it in bash on Ubuntu 22.04, so I'm not sure what else I'm supposed to do to get past this error. `Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 7B Downloading LICENSE and Acceptable Usage Policy --2023-07-18 12 12-- Resolving l.facebook.com (l.facebook.com)... 2a03 f103 face 0:14c9, 31.13.66.36 Connecting to l.facebook.com (l.facebook.com)|2a03 f103 face 0 443... connected. HTTP request sent, awaiting response... 302 Found Location: [following] --2023-07-18 12 12-- Resolving www.facebook.com (www.facebook.com)... 2a03 f103 face 0:25de, 31.13.66.35 Connecting to www.facebook.com (www.facebook.com)|2a03 f103 face 0 443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified Saving to: [ <=> ] 47.78K in 0.02s 2023-07-18 12 12 (2.14 - saved [48924] Checking checksums md5sum: checklist.chk: no properly formatted MD5 checksum lines found `",2023-07-18T16:31:11Z,llama,https://github.com/meta-llama/llama/issues/350 348,1805550331,"new token, downloading 13B","On June 21 I (fox received access but it turns out that the 13B model that was downloaded did not work. Can I get a new token or other help to obtain the 13B model? Many thanks! Issues: 1. checklist.chk and params.json were empty 2. consolidated.01.pth is a lot smaller than the other model checkpoint file. ",2023-07-14T20:44:28Z,llama,https://github.com/meta-llama/llama/issues/348 347,1801126218,"Issue with Redirects Not Supported Error in Windows and macOS. When running torchrun, a RuntimeError is encountered with the message ""unmatched '}' in format string.""","**Description:** When running the command, a RuntimeError is encountered with the message ""unmatched '}' in format string."" Run command I encountered an issue while running a script that involves redirecting output. It seems that redirects are currently not supported in Windows environments This issue causes a runtime error with the following traceback: **Environment:** Operating System: Windows10 Python Version: Python 3.9.13 Torch Version: 2.0.1 Please let me know if any further information is required to address this issue.",2023-07-12T14:39:56Z,llama,https://github.com/meta-llama/llama/issues/347 346,1783746202,Can I use same architecure picture between Transformer and LLaMA ?,"Hello everyone, I wonder that the main architecture between Transformer and LLaMA are the same, both are encoder-decoder model. The main difference with the Transformer architecture: - RMSNorm normalizing function - The ReLU non-linearity is replaced by the SwiGLU activation function - Absolute positional embeddings are removed and instead rotary positional embeddings (RoPE) are added at each layer of the network Basically, I can use the picture from Transformer and editting three different parts, right ? Picture from Language Modeling with nn.Transformer and torchtext — PyTorch Thanks",2023-07-01T09:44:19Z,llama,https://github.com/meta-llama/llama/issues/346 345,1779422463,Requesting an extension to the 7-day validity of the download link --> What is the process,"Hi, I have been given access to the model with a 7-day valid download link. However, I need more time to organize the computing resources needed to download and run the model. What is the process to request an extension to the 7-day validity period? The request was from my email gaurav.narasimhan Do I need to raise another request (which the last time took several days to approve) -- or is there a separate process to get an extension to the 7-day validity?",2023-06-28T17:47:43Z,llama,https://github.com/meta-llama/llama/issues/345 344,1777774769,Is it possible to run 7B on a MacBook Pro M1 with 16MB Ram?,"Hello, I am totally new to AI and Llama, but with ChatGPT's help am trying to learn. I have a fair amount of experience coding econometrics (matrix algebra in SAS and Stata) and ChatGPT 4.0 did miracles to help me get started with GIS scripts in R, so I thought this might be possible. Perhaps I got too ambitious...! Anyway, I would dearly like to learn more about LLMs to see if I can somehow create one for my elderly mother who has dementia. I think a patient, kind chatbot with knowledge of her past and the ability to engage with her in a suitable way to avoid agitation could really improve her quality of life. Certainly way more qualified than me working on this, but we really need it sooner rather than later so I'm giving it a Hail Mary shot. I have a 2021 MacBook Pro M1 with 16MB RAM. I've now downloaded the 7B model and tried running it in several different ways following advice from ChatGPT, who tried to refine the 'example.py' code to try to run on my machine. However in the end as my Macbook does not have an Nvidia GPU, ChatGPT has more or less told me I've bitten off more than I can chew. I'm considering migrating to Google Colab (under watchful guidance of ChatGPT), but would be grateful for any human comments or suggestions.",2023-06-27T21:23:12Z,llama,https://github.com/meta-llama/llama/issues/344 343,1777518663,Issue downloading weights and Tokenizer,"I followed the steps in the README and properly edited the download.sh file to include the Target folder where the model weoghts should be downoaded. It returns the errors in the below file. Please advise! karimoweiss.txt ",2023-06-27T18:28:10Z,llama,https://github.com/meta-llama/llama/issues/343 342,1775211191,Required number of GPUs to TRAIN LLaMA 7b ,"Hi, thank you for the amazing work! I'm wondering, as I tried to fine-tune LLaMA-7b with 1x NVIDIA A100-80GB to no avail, what is the minimum number of GPUs to train this smallest variant of LLaMA? I managed to train it with 2x NVIDIA A100-80GB, but I wonder if I did something inefficient and maybe I could've trained LLaMA 7b with only 1 GPU. To be more specific, the successful training requires 2x 80GB GPUs to run: Does this look normal or does it seem that I do something wrong? Looking forward to hearing back from you! ",2023-06-26T17:04:02Z,llama,https://github.com/meta-llama/llama/issues/342 341,1774078683,EC2-T4 (1 GPU) - AssertionError: Loading a checkpoint for MP=0 but world size is 1. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 31594) of binary: /home/ec2-user/anaconda3/bin/python,"I have looked into the other (related issue) which was closed when the issue was about MODEL_SIZE vs $MODEL_SIZE... but the current code for Inference is different (from maybe hwen the earlier issue was reported). Here it is for reference Inference The provided example.py can be run on a single or multi-gpu node with torchrun and will output completions for two pre-defined prompts. Using TARGET_FOLDER as defined in download.sh: torchrun --nproc_per_node MP example.py --ckpt_dir --tokenizer_path ********* this is what I am running fyi *********** torchrun --nproc_per_node 1 example.py --ckpt_dir --tokenizer_path ***** Instance specs ******* ec2, T4 instance, 1 GPU. ",2023-06-26T07:04:41Z,llama,https://github.com/meta-llama/llama/issues/341 340,1773930127,issue with downloading the model weights,"I followed the instruction in the email sent to me and bashed download.sh but got the following error: what should I do?",2023-06-26T05:43:46Z,llama,https://github.com/meta-llama/llama/issues/340 339,1773187535,Do you have plan to traing multi LLM model for public?,"The Llama 7b,13B,33B,65B is very famous in the whole world. Do you have the plan to expand the work vocab size or multi languages? for example add more chinese ,Korean,Jpanese and so on. ",2023-06-25T10:01:18Z,llama,https://github.com/meta-llama/llama/issues/339 337,1771680149,Link sent by llmaccess@extern.facebookmail.com does not appear to work,"Thank you for accepting my application for access to LLAMA. Unfortunately, the link you sent me does not work. I apologize for posting an issue for this, but was uncertain as to how else I could notify you. CloudFront responds with: My email address is smjones at lanl.gov.",2023-06-23T15:49:03Z,llama,https://github.com/meta-llama/llama/issues/337 336,1770405069,30B =! 33B,"You show a table that says that the model has 33B, but at the moment of downloading you must specify 30B, why is this? PRESIGNED_URL="""" # replace with presigned url from email MODEL_SIZE=""7B,13B,30B,65B"" # edit this list with the model sizes you wish to download TARGET_FOLDER="""" ",2023-06-22T21:14:36Z,llama,https://github.com/meta-llama/llama/issues/336 335,1769626624,Runing 7B: CUDA out of memory with 256gb ram?,"I am running out of memory when i try to run the 7B model and i cannot figure out why. I am using a a computing instance on Azure ML studio with 24 cores, 224 GB RAM, 1440 GB disk and 4 x NVIDIA Tesla K80. This is the error message: And these outputs are are from just before the error Occurred: There is probably something here that I am missing or have misunderstood. I Hope you can help me. Thanks!",2023-06-22T12:58:59Z,llama,https://github.com/meta-llama/llama/issues/335 334,1768981171,Running example on WSL RTX 3090,"I am trying to run the following command: I am getting the following error: The similar error gets repeated for every layer ",2023-06-22T05:59:00Z,llama,https://github.com/meta-llama/llama/issues/334 333,1768645249,Running example on MacBookPro,"I've just received LLAMA access and trying to run the example.py provided. My MacBookPro with an Apple M1 Pro chip isn't showing as supporting CUDA. What is needed to make the example.py code work? Running this command: torchrun --nproc_per_node 1 example.py --device cpu --ckpt_dir --tokenizer_path Here's the error messages received. Is CUDA supported by this system?False CUDA version: None local rank 0 world size 1 > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Traceback (most recent call last): File line 123, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 82, in main generator = load( File line 44, in load assert world_size == len( AssertionError: Loading a checkpoint for MP=0 but world size is 1 ERROR failed (exitcode: 1) local_rank: 0 (pid: 17089) of binary: Fatal Python error: Segmentation fault ",2023-06-21T23:07:18Z,llama,https://github.com/meta-llama/llama/issues/333 332,1768475230,I get the following error when I click on the link in the email,"Here is the error: This XML file does not appear to have any style information associated with it. The document tree is shown below. Access ",2023-06-21T20:59:51Z,llama,https://github.com/meta-llama/llama/issues/332 331,1768065869,Model Weights Link Access Denied,"Clicking on the emailed linked leads to page with the following error: AccessDeniedAccess ",2023-06-21T17:14:33Z,llama,https://github.com/meta-llama/llama/issues/331 330,1760123977,Llama access URL not received recently,"I've been actively using Llama and am quite appreciative of your work. However, I'm currently facing an issue. I received an access URL last month, which worked fine, but when I filled out the form last week, I didn't receive any email providing me with a new access URL. I've double-checked my spam folder and haven't seen any emails from Meta LLaMa team. Can anyone confirm if there are recent cases of successful URL deliveries? Additionally, I would like to inquire about the validity period of these URLs. From my understanding, it seems the URL expires after 7 days of receipt. Is this truly the case? If so, could you possibly clarify why such a restriction is in place? Your assistance would be greatly appreciated. Best regards, Yilei",2023-06-16T07:51:14Z,llama,https://github.com/meta-llama/llama/issues/330 329,1752553374,Does anyone who noticed the repetition of Vocabulary?,"Hi, I would like to ask if you have found any problems with duplicate subwords in the 32k Vocabulary when using the vanilla llama model? (e.g. 405 and 29940 both correspond to ""N"" in the Vocabulary) I have recently been trying to analyse the llama code-generation process, does this problem of duplicate subwords cause llama to have different cuts for the same word during training, or to learn different ways of generating the same word generation? This seems a bit strange and I would like to hear your opinion, Thanks! ",2023-06-12T11:37:33Z,llama,https://github.com/meta-llama/llama/issues/329 326,1743060146,There is bug in trainer:indices should be either on cpu or on the same device as the indexed tensor (cpu),"**I'm sure there is no problem with my code because others work fine. I'm guessing it's an issue with incompatible environment configurations.** Parameter Offload: Total persistent parameters: 266240 in 65 params 0%| | [00:00 train() File line 112, in train trainer.train() File line 1661, in train return inner_training_loop( File line 1946, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File line 2756, in training_step loss = self.compute_loss(model, inputs) File line 2781, in compute_loss outputs = model(**inputs) File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 15, in wrapped_fn ret_val = func(*args, **kwargs) File line 1733, in forward loss = self.module(*inputs, **kwargs) File line 1212, in _call_impl result = forward_call(*input, **kwargs) File line 688, in forward outputs = self.model( File line 1212, in _call_impl result = forward_call(*input, **kwargs) File line 570, in forward layer_outputs = torch.utils.checkpoint.checkpoint( File line 249, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File line 107, in forward outputs = run_function(*args) File line 566, in custom_forward return module(*inputs, output_attentions, None) File line 1212, in _call_impl result = forward_call(*input, **kwargs) File line 292, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File line 1212, in _call_impl result = forward_call(*input, **kwargs) File line 202, in forward query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids) File line 134, in apply_rotary_pos_emb cos = cos[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim] RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) **Here is part of my conda list:** # Name Version Build Channel _libgcc_mutex 0.1 conda_forge _openmp_mutex 4.5 2_gnu absl-py 1.4.0 pypi_0 pypi accelerate 0.19.0 pypi_0 pypi aiohttp 3.8.4 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi anyio 3.7.0 pypi_0 pypi appdirs 1.4.4 pypi_0 pypi async-timeout 4.0.2 pypi_0 pypi attrs 23.1.0 pypi_0 pypi bcrypt 4.0.1 pypi_0 pypi blas 1.0 mkl brotli 1.0.9 h166bdaf_8 brotli-bin 1.0.9 h166bdaf_8 bzip2 1.0.8 h7f98852_4 ca-certificates 2023.5.7 hbcca054_0 certifi 2023.5.7 pyhd8ed1ab_0 cffi 1.15.1 pypi_0 pypi charset-normalizer 3.1.0 pyhd8ed1ab_0 click 8.1.3 pypi_0 pypi contourpy 1.0.7 pypi_0 pypi cryptography 41.0.1 pypi_0 pypi cudatoolkit 11.3.1 h9edb442_11 cycler 0.11.0 pypi_0 pypi datasets 2.12.0 pypi_0 pypi deepspeed 0.9.3+e02b8d0b pypi_0 pypi dill 0.3.6 pypi_0 pypi docker-pycreds 0.4.0 pypi_0 pypi exceptiongroup 1.1.1 pypi_0 pypi fastapi 0.96.0 pypi_0 pypi ffmpeg 4.3 hf484d3e_0 ffmpy 0.3.0 pypi_0 pypi filelock 3.12.0 pypi_0 pypi fire 0.5.0 pypi_0 pypi fonttools 4.39.4 pypi_0 pypi freetype 2.12.1 hca18f0e_1 frozenlist 1.3.3 pypi_0 pypi fsspec 2023.5.0 pypi_0 pypi gitdb 4.0.10 pypi_0 pypi gitpython 3.1.31 pypi_0 pypi gmp 6.2.1 h58526e2_0 gnutls 3.6.13 h85f3911_1 gradio 3.9 pypi_0 pypi h11 0.12.0 pypi_0 pypi hjson 3.1.0 pypi_0 pypi httpcore 0.15.0 pypi_0 pypi httpx 0.24.1 pypi_0 pypi huggingface-hub 0.15.1 pypi_0 pypi idna 3.4 pyhd8ed1ab_0 intel-openmp 2021.4.0 h06a4308_3561 jinja2 3.1.2 pypi_0 pypi joblib 1.2.0 pypi_0 pypi jpeg 9e h0b41bf4_3 kiwisolver 1.4.4 pypi_0 pypi lame 3.100 h166bdaf_1003 lcms2 2.12 hddcbb42_0 ld_impl_linux-64 2.40 h41732ed_0 lerc 3.0 h9c3ff4c_0 libbrotlicommon 1.0.9 h166bdaf_8 libbrotlidec 1.0.9 h166bdaf_8 libbrotlienc 1.0.9 h166bdaf_8 libdeflate 1.10 h7f98852_0 libffi 3.4.2 h7f98852_5 libgcc-ng 12.2.0 h65d4601_19 libgomp 12.2.0 h65d4601_19 libiconv 1.14 0 libnsl 2.0.0 h7f98852_0 libpng 1.6.39 h753d276_0 libsqlite 3.42.0 h2797004_0 libstdcxx-ng 12.2.0 h46fd767_19 libtiff 4.3.0 h0fcbabc_4 libuuid 2.38.1 h0b41bf4_0 libwebp-base 1.3.0 h0b41bf4_0 libzlib 1.2.13 h166bdaf_4 linkify-it-py 2.0.2 pypi_0 pypi markdown-it-py 2.2.0 pypi_0 pypi markupsafe 2.1.3 pypi_0 pypi matplotlib 3.7.1 pypi_0 pypi mdit-py-plugins 0.3.5 pypi_0 pypi mdurl 0.1.2 pypi_0 pypi mkl 2021.4.0 h06a4308_640 mkl-fft 1.3.1 pypi_0 pypi mkl-random 1.2.2 pypi_0 pypi mkl-service 2.4.0 pypi_0 pypi mkl_fft 1.3.1 py310h2b4bcf5_1 mkl_random 1.2.2 py310h00e6091_0 multidict 6.0.4 pypi_0 pypi multiprocess 0.70.14 pypi_0 pypi ncurses 6.3 h27087fc_1 nettle 3.6 he412f7d_0 ninja 1.11.1 pypi_0 pypi nltk 3.8.1 pypi_0 pypi numpy 1.24.3 pypi_0 pypi numpy-base 1.24.3 py310h8e6c178_0 olefile 0.46 pyh9f0ad1d_1 openai 0.27.7 pypi_0 pypi openh264 2.1.1 h780b84a_0 openjpeg 2.5.0 h7d73246_0 openssl 3.1.1 hd590300_1 orjson 3.9.0 pypi_0 pypi packaging 23.1 pypi_0 pypi pandas 2.0.2 pypi_0 pypi paramiko 3.2.0 pypi_0 pypi pathtools 0.1.2 pypi_0 pypi pillow 8.4.0 pypi_0 pypi pip 23.1.2 pyhd8ed1ab_0 protobuf 3.20.3 pypi_0 pypi psutil 5.9.5 pypi_0 pypi py-cpuinfo 9.0.0 pypi_0 pypi pyarrow 12.0.0 pypi_0 pypi pycparser 2.21 pypi_0 pypi pycryptodome 3.18.0 pypi_0 pypi pydantic 1.10.8 pypi_0 pypi pydub 0.25.1 pypi_0 pypi pynacl 1.5.0 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi pysocks 1.7.1 pypi_0 pypi python 3.10.11 he550d4f_0_cpython python-dateutil 2.8.2 pypi_0 pypi python-dotenv 1.0.0 pypi_0 pypi python-multipart 0.0.6 pypi_0 pypi python_abi 3.10 3_cp310 pytorch 1.12.0 py3.10_cuda11.3_cudnn8.3.2_0 pytorch-mutex 1.0 cuda pytz 2023.3 pypi_0 pypi pyyaml 6.0 pypi_0 pypi readline 8.2 h8228510_1 regex 2023.6.3 pypi_0 pypi requests 2.31.0 pyhd8ed1ab_0 responses 0.18.0 pypi_0 pypi rouge-score 0.1.2 pypi_0 pypi safetensors 0.3.1 pypi_0 pypi sentencepiece 0.1.99 pypi_0 pypi sentry-sdk 1.25.0 pypi_0 pypi setproctitle 1.3.2 pypi_0 pypi setuptools 67.7.2 pyhd8ed1ab_0 six 1.16.0 pyh6c4a22f_0 smmap 5.0.0 pypi_0 pypi sniffio 1.3.0 pypi_0 pypi starlette 0.27.0 pypi_0 pypi tensorboardx 2.6 pypi_0 pypi termcolor 2.3.0 pypi_0 pypi tk 8.6.12 h27826a3_0 tokenizers 0.13.3 pypi_0 pypi torch 1.13.1+cu116 pypi_0 pypi torchaudio 0.13.1+cu116 pypi_0 pypi torchvision 0.14.1+cu116 pypi_0 pypi tqdm 4.65.0 pypi_0 pypi transformers 4.30.0.dev0 pypi_0 pypi",2023-06-06T04:27:04Z,llama,https://github.com/meta-llama/llama/issues/326 325,1741658904,What is the prompt and setting for GSM8K evaluation?,"Hi, I am trying to reproduce the LLaMa on the GSM8K dataset. I basically follow this repo: However, the performance across is far from the paper's result. I can only get 7.13 for an 8-shot with LLaMa-7B. May I know if anyone has reproduced the results and what is the prompt you are using? ",2023-06-05T12:19:34Z,llama,https://github.com/meta-llama/llama/issues/325 323,1740640641,Missing tokenizer.model,"I seem to be missing tokenizer.model. Looking at the code, it seems download.sh should have downloaded this, but for some reason I don't have it. Unfortunately, my link no longer works, but is there any way for me to get this file? Or get a renewed link to download everything again?",2023-06-04T22:36:10Z,llama,https://github.com/meta-llama/llama/issues/323 321,1739609219,LLaMA can't generate eos token,"Hi, when I tried your models, I found that the model can't generate eos token, which means the model can't stop generation. Do you think it's because eos token wasn't included in the pretraining stage, or simply because the generation procedure hasn't finished? (which means the eos token can be generated for some cases) Thanks!",2023-06-03T15:06:48Z,llama,https://github.com/meta-llama/llama/issues/321 320,1739075793,Not received the weight!,Hey! I have filled the google form and not received the weights. Can someone help me out here please,2023-06-03T02:20:05Z,llama,https://github.com/meta-llama/llama/issues/320 318,1733198192,Download error,"I received following error initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Traceback (most recent call last): File line 119, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File line 78, in main generator = load( ^^^^^ File line 42, in load assert world_size == len( ^^^^^^^^^^^^^^^^^^ AssertionError: Loading a checkpoint for MP=0 but world size is 1 ERROR failed (exitcode: 1) local_rank: 0 (pid: 1921304) of binary: Traceback (most recent call last): File line 33, in sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 346, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-05-30_15 11 host : ai4covid-Precision-7920-Rack rank : 0 (local_rank: 0) exitcode : 1 (pid: 1921304) error_file: traceback : To enable traceback see: ",2023-05-31T00:28:30Z,llama,https://github.com/meta-llama/llama/issues/318 317,1732645165,download.sh 403 Forbidden,"Hello, I received the approval link on May 24, 2023. Yet, I get the download forbidden message as below: Resolving dobf1k6cxlizq.cloudfront.net (dobf1k6cxlizq.cloudfront.net)... 13.225.210.160, 13.225.210.61, 13.225.210.136, ... Connecting to dobf1k6cxlizq.cloudfront.net (dobf1k6cxlizq.cloudfront.net)|13.225.210.160|:443... connected. HTTP request sent, awaiting response... 403 Forbidden I believe the link expired, even though the email said it would expire after seven days. Is there a solution to this error? Thanks",2023-05-30T17:01:52Z,llama,https://github.com/meta-llama/llama/issues/317 316,1729818423,face problems when downloading weights,"Hi, I got my access link on May 24th, and I tried to download the weight for the model using the modified download.sh file. But I was stuck at Connecting to dobf1k6cxlizq.cloudfront.net (dobf1k6cxlizq.cloudfront.net)|108.139.0.22|:443... connected.HTTP request sent, awaiting response... 403 Forbidden2023-05-28 18 48 ERROR 403: Forbidden. Checking checksums I checked online and noticed that people have similar issues. Some reply that the link expires within a day instead of a week. I don't know if that's the problem. Thanks",2023-05-29T01:20:05Z,llama,https://github.com/meta-llama/llama/issues/316 315,1729048112,"Not running, probably user error - Linux Ubuntu","I did the following: - got the URL to download the model weights, - did the installation (pip install -r requirements.txt and pip install -e .), - edited the download.sh's value of PRESIGNED_URL="" - edited the and now not sure what to do. I made the .sh file executable (chmod) and tried to run it. It gives this error: I tried running the example command: torchrun --nproc_per_node MP example.py --ckpt_dir --tokenizer_path and this gave this error: torchrun --nproc_per_node MP example.py --ckpt_dir --tokenizer_path Traceback (most recent call last): File line 632, in determine_local_world_size return int(nproc_per_node) ValueError: invalid literal for int() with base 10: 'MP' The above exception was the direct cause of the following exception: And the more simple attempt of python3 example.py failed with: _Some computer information: I am using..._ Ubuntu 23.04, 64-bit Terminal Lenovo V145-15AST AMD A4-9125 RADEON R3, 4 COMPUTE CORES 2C+2G × 2 Linux 6.2.0-20-generic Let me know if you know what I'm doing wrong (or, unlikely, there's a bug in the code). Thanks!",2023-05-28T00:43:10Z,llama,https://github.com/meta-llama/llama/issues/315 314,1727902941,Access link didn't work,"I received the email giving access to the model, but the link does not work and I receive an ""Access Denied"" message. The email associated with the account is abutt6@jhu.edu.",2023-05-26T16:17:23Z,llama,https://github.com/meta-llama/llama/issues/314 313,1727377668,403 Forbidden for downloading the models,"I got access yesterday, but when I was trying to download the models today, the URL did not work: I saw some people had the same issue last month, #277. Is this the same problem?",2023-05-26T10:50:11Z,llama,https://github.com/meta-llama/llama/issues/313 312,1726885003,Unable to download from link XML/ Access Denied,oops,2023-05-26T04:27:37Z,llama,https://github.com/meta-llama/llama/issues/312 310,1718440742,how to use llama to finish the text summarization task,"I am a freshman in NLP. I want to finish the text summarization task with the llama. I have tried many prompts, such as [abstract]:, [text summarization]:, and giving the model a text summarization example, but they could be more helpful. Is the text summarization fine-tuning or other methods needed?",2023-05-21T09:50:11Z,llama,https://github.com/meta-llama/llama/issues/310 309,1715405441,make llama work on more backends with a new parameter `--backend`,"Comparing with PR #253, this one adds a new parmeter , which allows more options besides and .",2023-05-18T10:36:31Z,llama,https://github.com/meta-llama/llama/pull/309 308,1707637183,error: unknown argument: a," image details: ",2023-05-12T13:25:50Z,llama,https://github.com/meta-llama/llama/issues/308 307,1706976866,same question --> same answer.,"I trained the llama model and tried inference. However, I don't know why this always generates the same answer for the same question. Please answer to me if you know the reason for this. write parameter blew: temperature=0.9, top_p=1, top_k=100, num_beams=5, ",2023-05-12T05:57:28Z,llama,https://github.com/meta-llama/llama/issues/307 306,1706568953,AssertionError: model parallel group is not initialized,"Hello team, I'm trying to run the example.py file with 7B on a single GPU with this command , but I've got the following error: Can you please advise how to handle this? Thanks! ",2023-05-11T21:16:06Z,llama,https://github.com/meta-llama/llama/issues/306 302,1702293041,Is there lost the codes of token one-hot encoding?,"In model.py, the is pasted to in function. But in generation.py, was defined as , which will be pasted to . Is there loss the code of token one-hot encoding? thanks very much!!!",2023-05-09T15:41:01Z,llama,https://github.com/meta-llama/llama/issues/302 301,1701511779,Integrate FlashAttention in LLaMA,"This PR may not aim to merge, just to show the usage of Flash Attention. ",2023-05-09T07:27:19Z,llama,https://github.com/meta-llama/llama/pull/301 300,1698664888,Fix Typos and Polish and Markdown improvment,"Fix Typos and Polish and Markdown improvment in README.md, CODE_OF_CONDUCT.md, CONTRIBUTING.md, UPDATES.md,USE_POLICY.md, MODEL_CARD.md",2023-05-06T15:02:17Z,llama,https://github.com/meta-llama/llama/pull/300 299,1698070010,Legality of models fine-tuned on Llama,"I would like to know the official stance on the legality of models that claim to be a fine-tune of Llama. For example: gpt4all-lora has a GPL 3 license (allows commercial use), yet it is built by fine-tuning Llama (which prohibits commercial use).",2023-05-05T19:08:48Z,llama,https://github.com/meta-llama/llama/issues/299 297,1695890901,Problems on generating with llama model,"Hi, I tried loading the llama model for inference and encountered some problems. I use 4 v100 GPUs with the model parallel size of 4 to load the llama 7b checkpoints. 1. Error in loading the llama checkpoint. The converted checkpoint generated by the script provides no optimizer states. but the keep trying to load the optimizer state even if I set the flag to true. The cause seems to be in here. Regardless of whether the file exists, deepspeed will return the file list of optimizer states. I fix this by adding an additional line to check if the file exists and returning None if not. 2. Tensor shape mismatch occurred during inference. This is fixed by changing the line here, where is change to I wonder if my fixes are correct, or if there are better ways to fix this. I think I just tackling the phenomenon of the problem but not the causes of it. ",2023-05-04T12:13:15Z,llama,https://github.com/meta-llama/llama/issues/297 296,1692732025,Paper questions: Common Crawl processing questions,"There are a few details missing from the paper that are required to really understand what data was actually used for training LLAMA. The paper notes: > We preprocess five CommonCrawl dumps, ranging from 2017 to 2020, with the CCNet pipeline However, the size of crawls within a year varies dramatically. Which crawls were actually used? Also, CCNet contains a perplexity threshold. Was the default value of 340 used? Finally, the paper notes: > we trained a linear model to classify pages used as references in Wikipedia v.s. randomly sampled pages, and discarded pages not classified as references. Approximately what % of pages were filtered out by this classifier?",2023-05-02T16:29:35Z,llama,https://github.com/meta-llama/llama/issues/296 295,1692672457,Paper question: Was there more processing on the books data than was noted?,"Hi – I've been looking at the books slice of the pre-training dataset quite a bit, and I can't figure out how the original processing resulted in only 85GB of data. The red pajama books replication resulted in 119GB of data using just pg19, which I would expect to be a bit smaller than the most recent gutenberg dumps. Was there some additional quality filtering done on the books data? It would make sense, given that some of it looks rather garbled. I guess it could also be explained by a different approach to shingling generally, such as using a much smaller shingle size, or doing char-shingles rather than full-word shingles? But even then, 35 GB of data is a lot, and it doesn't look to me like red pj is doing anything busted in their script. Thanks, Michael",2023-05-02T15:51:31Z,llama,https://github.com/meta-llama/llama/issues/295 294,1689768895,Logits for all positions?,"In , the following line says it'll only compute the logits for the last position in h: I'm interested in getting surprisal values for each word in a sentence, so I'd like logits for every position. It looks like first, I need to fix up the inputs by converting the s to , since is , which doesn't have an embedding. In contrast, is , which does have an embedding (though I'm not bothering to examine the logits for it or anything after—it's just to be able to run batches of sentences with unequal lengths). After I do this, is it as simple as changing the line above to the following to get the logits for each position for each example in the batch? Just want to make sure I'm not missing anything obvious. ",2023-04-30T03:56:18Z,llama,https://github.com/meta-llama/llama/issues/294 293,1689052652,Failed checksums,"Could anyone share how to correct an issue with checksums failing? ",2023-04-28T19:46:53Z,llama,https://github.com/meta-llama/llama/issues/293 292,1687928615,Explicitly state in README.md to use `bash download.sh` instead of download.sh in case user is not using bash.,"For zsh users, this script will throw confusing permission denied errors, even when making the script executable. I think it would be good to put in the README.md that they should use instead of , to avoid this error in the future.",2023-04-28T05:34:13Z,llama,https://github.com/meta-llama/llama/issues/292 291,1687818774,Download script not working ERROR 403: Forbidden,"I received my signed link but I can't download the model weights. My wget keeps getting . Is this an intermittent server issue that will be resolved soon, has anyone been able to download it recently? I'm using Ubuntu 20.",2023-04-28T02:57:50Z,llama,https://github.com/meta-llama/llama/issues/291 290,1687542383,It improves the download script,"This PR improves the script: 1. Accept the as parameter, so the user don't have to edit the file and add it. 2. Accept and as environments variables, with default values. 3. Added the parameter for cases behind proxies. 4. Check downloaded files integrity before download it again, so it'll download only the corrupted or the missing ones. Now we can call like those options: - With parameters ( in quotes is mandatory) `bash MODEL_SIZE=""7B,13B,30B,65B"" TARGET_FOLDER=""model-weights"" ""URL_FROM_EMAIL"" ` - Basic form (it'll use previous values as default, in quotes is mandatory) skipping TLS `bash --no-check-certificate ""URL_FROM_EMAIL"" `",2023-04-27T20:58:30Z,llama,https://github.com/meta-llama/llama/pull/290 289,1687532509,Improve download.sh script,"This PR improve the script: 1. Accept the as parameter, so the user don't have to edit the file and add it. 2. Accept and as parameters, with default values. 3. Added to command, so it'll avoid errors on download on networks behind proxies. 4. Check downloaded files integrity, so it'll download only the corrupted or the missing ones.",2023-04-27T20:50:25Z,llama,https://github.com/meta-llama/llama/pull/289 288,1686878257,🤩🤩🤩Awesome LLaMA extension with Vision Capability beyond GPT-4.,"I find an awesome work that uses pure LLaMA to understand visual information and support multi-modal chatting. The repo is called mPLUG-Owl, and it states better than miniGPT-4 and LLaVA with only 7B model size. By the way, the code and demo are both released! 🤩 Github Link: ",2023-04-27T13:38:58Z,llama,https://github.com/meta-llama/llama/issues/288 287,1686378065,"I got Error: ""RuntimeError: Internal: unk is not defined."""," > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Loading Traceback (most recent call last): File line 119, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 78, in main generator = load( File line 54, in load tokenizer = Tokenizer(model_path=tokenizer_path) File line 17, in __init__ self.sp_model = SentencePieceProcessor(model_file=model_path) File line 447, in Init self.Load(model_file=model_file, model_proto=model_proto) File line 905, in Load return self.LoadFromFile(model_file) File line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: unk is not defined. ERROR failed (exitcode: 1) local_rank: 0 (pid: 767054) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-04-27_15 06 host : sdc2-bdi-analytic-gpu1 rank : 0 (local_rank: 0) exitcode : 1 (pid: 767054) error_file: traceback : To enable traceback see: ============================================================ How to fix it?",2023-04-27T08:30:06Z,llama,https://github.com/meta-llama/llama/issues/287 286,1685927053,Does inference use single token,"Hi, I have a question, in it seems the input is just a single token since cur_pos = pre_pos+1, is this true? If so, what's the logic here? Thanks.",2023-04-27T00:57:53Z,llama,https://github.com/meta-llama/llama/issues/286 285,1685908530,Time to release a tokenizer? ,"Hi there, I've requested the Tokenizer using the google form a couple times and have not received any email or response. Is it possible to let me know, if this is a hold or block on any new requests? ",2023-04-27T00:23:23Z,llama,https://github.com/meta-llama/llama/issues/285 284,1685018754,Did not receive weights ,Hey! I have filled the google form and not received the weights. Can someone help me out here please,2023-04-26T13:18:49Z,llama,https://github.com/meta-llama/llama/issues/284 283,1684903491,No module named 'actor',"throw this errror ",2023-04-26T12:10:12Z,llama,https://github.com/meta-llama/llama/issues/283 281,1682980160,When can I get the llama weight download link after requesting via Google form?,"Hey all, I submitted a request for a weight file download link via a Google Form over a week ago, but I have not received the link yet. Can you please let me know when I can expect to receive the download link?""",2023-04-25T11:27:57Z,llama,https://github.com/meta-llama/llama/issues/281 280,1681962051,"How do i download the weights, the read md files unclear to me ","Hey all, I received the link to download and i have already installed the requirement .txt file and the ""pip install e ."". I have changed the url in the PRESIGNED_URL string to thw received link. What should i do with the .sh file to download the weights?",2023-04-24T20:16:03Z,llama,https://github.com/meta-llama/llama/issues/280 279,1681532475,Expected content in downloaded params.json files,"Hello! I had issues with downloading the files using wget so I switched to curl like this: curl -kLSs -o The download was successful, I think, as no errors where shown. However when the json files are opened I see: This does not seem right, did I mess up with the curl? Additional note: the issue I had with wget was that I tried to set it up in multiple ways but always run into an error in Git Bash: bash: wget: command not found",2023-04-24T15:22:26Z,llama,https://github.com/meta-llama/llama/issues/279 278,1675484861,403 When running download.sh (even after applying troubleshooting methods in readme),"I have received my presigned URL yesterday. After applying the troubleshooting steps mentioned in both and I still get 403 when running the the download script. I also get 403 when accessing the URLs that are printed in the console when running the scripts Many people are also complaining from this error. My presigned URL format is: ",2023-04-19T19:21:52Z,llama,https://github.com/meta-llama/llama/issues/278 277,1674423160,Suddenly 403 Forbidden,"I managed to download the 7B and 13B models; from 30B onwards the URL did suddenly not work anymore, but only returned ""Forbidden"" (even for the 7B now)... Is this a temporary thing? Is this some kind of traffic threshold on the cloudfront side?",2023-04-19T08:22:37Z,llama,https://github.com/meta-llama/llama/issues/277 275,1673089153,https://link.hackersthegame.com/view_replay.php?r=34229061&t=01225885&c=17133&q=219&s=406, ,2023-04-18T13:01:36Z,llama,https://github.com/meta-llama/llama/issues/275 274,1670254953,to onnx,can pytorch llama to onnx ?,2023-04-17T01:35:56Z,llama,https://github.com/meta-llama/llama/issues/274 273,1670032581,Main Results: incorrect analysis in the Research paper,"Hello, I finished reading the paper ""LLaMA: Open and Efficient Foundation Language Models"" and noted an error in reporting of results. The following sub-section caught my attention: + Under subsection 3.1 Common Sense Reasoning => you indicated that the LLaMA-65B outperformed Chinchilla-70B on all reported benchmarks but BoolQ but the data in Table 3 shows that LLaMA-65B has outperformed Chinchilla in all benchmarks including BoolQ ( LLaMA-65B shows 85.3 whereas Chinchilla-70B shows 83.7). ",2023-04-16T16:15:12Z,llama,https://github.com/meta-llama/llama/issues/273 272,1669678907,There are some uncertainties about the calculations for the prediction.,"When I was looking at the source code of the generation.py, I found that only the previously calculated token is fed into the model each time. Would this result in using only the first few positions of the network that use the transformer module? What I mean is that the input position in taring for **[prev_pos : cur_pos]** is **[prev_pos-----prev_pos + len(current token)-1]**. If we follow the above code in generation, the input position becomes **[0----len(current token)- 1]**. Wouldn't this affect the predicted output? Or is it because of the power of the model, we don't need to compute the current token within [prev_pos-----prev_pos + len(current token)-1] like we do during training?I know we **cached previous keys&values**, but I still confuse the above problem. I would greatly appreciate it if someone could help me resolve my confusion. First Edit: In the Transformer module, **the parameters at each position are shared**, so we can only pass the last genrated tokens. But the model still doesn't know the correct position of the input tokens.",2023-04-16T05:03:02Z,llama,https://github.com/meta-llama/llama/issues/272 271,1669142755,LLAMA tokenizer,"Why there is negative index, e.g. -1, after the decoding of tokenizer?",2023-04-15T04:36:00Z,llama,https://github.com/meta-llama/llama/issues/271 270,1668831116,Can we contribute our GPUs for training purposes?,Many times I see GPU as bottleneck for development of some feats such as new LLMs such as Llama. I used to contribute my CPU to Folding Home. Can we contribute GPU for FOSS AI related projects? I'm sure pretty much every nerd with a GPU would love to do that. I think it would speed up FOSS AI development.,2023-04-14T19:20:20Z,llama,https://github.com/meta-llama/llama/issues/270 269,1668550740,Please give Llama an Apache license for the 7B model.,"(1) To Mr Zuckerberg, please consider giving the 7B model an Apache license. There are already other opensource models that are about 7B in size that match Llama so there's no harm in releasing 7B model as Apache license. (2) Please can you make also 3.5B model with an Apache license. Thank you Mr Zuckerberg. 🙏",2023-04-14T16:19:23Z,llama,https://github.com/meta-llama/llama/issues/269 267,1667601076,what is the context size/context window of LLaMA?,"What is the maximum token limit of llama? Is it 1024, 2048, 4096, or longer? How much can it handle during the inference? I did find similar issues but no one has really answered the question, so I would appreciate any help I can get.",2023-04-14T06:33:58Z,llama,https://github.com/meta-llama/llama/issues/267 266,1664487038,Is it ok to use leaked LLaMA for research? ,"I would like to pose a question: Is it appropriate for the scientific community to utilize LLaMA for research if the application has not been explicitly approved? This inquiry seems to concern numerous conscientious researchers. As many know, the model's weights can be found on torrent, and even more, the link to this torrent is accessible within this repository. The license for these weights permits their use for scientific purposes. According to Yann LeCun, the sole reason LLaMA was not made freely available was due to concerns that the model could ""destroy the fabric of society."" However, with the leaked model, the circumstances have changed. Those who intend to use LLaMA for malicious purposes now have an advantage, while researchers find themselves in a ""gray zone,"" restrained by licensing complications. I have two questions to present. First, for the research community, what are your thoughts on using the leaked LLaMA for research from both ethical and legal perspectives? Secondly, I would like to ask the Facebook team to share their standpoint on this matter, given that the model's weights are already _de-facto_ available.",2023-04-12T12:39:14Z,llama,https://github.com/meta-llama/llama/issues/266 265,1662516097,Why double the max sequence length while precomputing the frequency for rotary embedding?," Is there anyone who explain about why the sequence length is doubled?",2023-04-11T13:42:12Z,llama,https://github.com/meta-llama/llama/issues/265 263,1659760170,Why one token corresponds to multiple token ids," ",2023-04-09T05:52:18Z,llama,https://github.com/meta-llama/llama/issues/263 262,1659593928,Set numpy version to ~=1.22,"After creating a conda environent based off base and installing dependencies: I had the following error: Explicitly setting the numpy version to 1.22.x helped.",2023-04-08T17:44:03Z,llama,https://github.com/meta-llama/llama/pull/262 261,1659420790,Do you support Chinese?, ,2023-04-08T06:47:47Z,llama,https://github.com/meta-llama/llama/issues/261 259,1656531993,Training field corpora,Can I train field corpora based on the LLaMA Model for use in the field. what should I do? Thanks,2023-04-06T01:29:44Z,llama,https://github.com/meta-llama/llama/issues/259 258,1655182170,improve LLaMA for visual understanding like GPT-4,"Thanks for the good works! We have tried to improve LLaMa model to understand visual information and support multi-modal chatting. We are inspired that a good vit, e.g., CLIP vision encoder, and a well-trained large language model, e.g., LLaMA, with connection network, e.g., MLP or Transformer, can cover visual applications, like PALM-E. The results in image captioning, VQA, and more multi-modal tasks, are promising in 7B and we call on more people to support testing of larger models. Github: - [X] fine-tuning scripts and hyper-parameters setting - [X] datasets for fine-grained alignment and instruct tuning - [x] interactive gradio and visual chatbot ",2023-04-05T08:35:17Z,llama,https://github.com/meta-llama/llama/issues/258 256,1654709276,An ingenious way to speed up inference! 🚀,"I thought of a way to speed up inference by using batches. This assumes that you can run a batch of 2 faster much than you can run 2 passes. So it will work with GPUs with a lot of compute cores or multi-GPU setups. The algorithm scales so the more computing power (more GPUs) the faster it will go. First create a dictionary that gives the most common token to follow each particular token. e.g. the most common token to follow 'there' might be 'was'. You could probably get this data by just going through every token with a window of 1. And store the most likely next token. Then store these in a dictionary. Say your tokens are this: Then you put them as a batch of two like this. In the second batch, you simply guess the next token using your dictionary. (In this case your dictionary says that the most common word to follow 'there' is 'was'.) So now, if the output is this: It means you have got two tokens for the price of one [was, a]. I'm not sure what percent of the time you will get lucky like this. You might only do a double batch if you are fairly certain of the next word(s). You can always do bigger batches if you are less certain of the next word. Or you can even guess several words ahead. Thus with dictionary lookups, and guessing ahead you might be able to speed up inference maybe two times! This is the simplest way, a more complicated way would be to train a very small neural network (or use the same NN but on a very small window) to guess the next word, before running the full neural network. This means that if the small NN guesses correctly, you skip ahead several tokens! 🚀 (I wonder if such an algorithm is implemented by Chat GPT or Bard 🤔) Unfortunately using the ""window of 1"" method the most common token to follow any word is usually one of these: Which may make the method not so useful 🤔 Although for some words such as 'suggest' the most likely word to follow is 'that'. ---- I have found that I can use a smaller LLM such as the 111M cerebras model to make an initial good guess for the next word in 0.1 seconds then run a batch of 2. It gets the guess right a lot of the time. So in this way you can use a bad model to speed up a good model! ",2023-04-04T23:36:33Z,llama,https://github.com/meta-llama/llama/issues/256 255,1653714815,Download weights for organisation usage,"Hi, my organisation (investment management company) is looking to adopt LLaMA model in our work. As such, we will need to bring the weights in house. Please advice how we can proceed and if there is a contact person I can reach out to on this.",2023-04-04T11:34:44Z,llama,https://github.com/meta-llama/llama/issues/255 254,1653169716,Plan for non-supported languages!,"Hi, As mentioned in the paper, supported languages are bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk. Is there any plan to support other high-resource languages like Persian in the future Or release the training and preprocessing scripts?",2023-04-04T04:53:51Z,llama,https://github.com/meta-llama/llama/issues/254 253,1653040326,make the llama work for cpu,It is useful to make llama's code work for CPU although it is very slow.,2023-04-04T02:03:49Z,llama,https://github.com/meta-llama/llama/pull/253 252,1652523031,Possible use for healthcare commercial application,"Hi, I am interested in using Llama as the basis of a further-fine-tuned model to answer questions specific to healthcare and our specific application. My name is Alex Smith, a principal data scientist for a company called Surest that is now a part of United Healthcare. We are a consumer-centric health insurance that is saving members like myself a huge amount on healthcare. Specifically, I am wondering if there is a possibility to get permissions to use the GPT4All model, that is built upon your powerful Llama model, for our commercial application. I believe this could be of great benefit to our members seeking answers to their questions and would be a really cool use case to try out. Thanks, Alex Smith",2023-04-03T18:03:11Z,llama,https://github.com/meta-llama/llama/issues/252 251,1651803630,download weight,can you provide PRESIGNED_URL,2023-04-03T10:38:09Z,llama,https://github.com/meta-llama/llama/issues/251 250,1651175130,How to fine-tune LLaMA with longer model_max_length and not increase the GPU memory too much?,"Hi, Is there any way to increase the model_max_length but not increase the GPU memory too much? I have reduced the batch size to and increased the gradient_accumulation_steps to . I am currently using model_max_length as which I want to increase it to a maxer number. The GPU memory for the following script causes nearly GPU memory for GPUs for each. Thank you very much in advance for any suggestions! ",2023-04-03T01:19:33Z,llama,https://github.com/meta-llama/llama/issues/250 249,1650891303,make the llama work for cpu,"It is useful to make llama's code work for CPU although it is very slow. ",2023-04-02T10:21:54Z,llama,https://github.com/meta-llama/llama/pull/249 248,1650639190,Doesn't work on anything other than 7B,It gives RunTimeError: Invalid scalar type when I try to run it with 13B. I have --nproc_per_node 2 argument set on the command line set as per the Meta readme. I looked around in the example.py file to see if maybe there were some variable I could change to make it work and couldn't find anything. Thanks for any help.,2023-04-01T20:27:48Z,llama,https://github.com/meta-llama/llama/pull/248 246,1650097209,No SILU/GELU/ReLU activation in the Attention block?!,"Ok, this is more of a question about transformers in general and not about Llama being different from the standard transformer architecture: why is there no activation on the assembled values, just before the output projection? Yes, one could argue the Softmax is an activation, but that's more about routing information, i.e. selecting which Values should be propagated to the output, which is very different from ""normal"" activation. And I get that the out projection doesn't get an activation so that it can both add & subtract from the residual connection. But once that output has been assembled, it would normally have an activation applied?! [Reading the source code](https ???",2023-03-31T22:03:55Z,llama,https://github.com/meta-llama/llama/issues/246 245,1650087723,FeedForward module's F.silu(self.w1(x)) * self.w3(x)?!,"Reading the source code I couldn't help but notice that Llama uses a to me unusual formulation for the feed forward layer: The key part is . After removing the python clutter it's basically: , i.e. there's an element-wise multiplication between the output of and . Where did this come from? There's no mention of it in either the Reading the source code or the Reading the source code? Thanks! ",2023-03-31T21:54:22Z,llama,https://github.com/meta-llama/llama/issues/245 244,1648491968,"_pickle.UnpicklingError: invalid load key, '<' + ERROR:torch.distributed.elastic.multiprocessing.api:failed","I have to gpu rtx3090 on the linux machine and got the following errors. Could anyone hep please? Thanks in advance. `(py37) torchrun --nproc_per_node 1 example.py --ckpt_dir --tokenizer_path > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Loading RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version! warnings.warn(""urllib3 ({}) or chardet ({}) doesn't match a supported "" Traceback (most recent call last): File ""example.py"", line 121, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""example.py"", line 80, in main generator = load( File ""example.py"", line 48, in load checkpoint = torch.load(ckpt_path, map_location=""cpu"") File line 795, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File line 1002, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'. ERROR failed (exitcode: 1) local_rank: 0 (pid: 648846) of binary: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version! warnings.warn(""urllib3 ({}) or chardet ({}) doesn't match a supported "" Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 762, in main run(args) File line 753, in run elastic_launch( File line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example.py FAILED ------------------------------------------------------------ Failures: `",2023-03-31T00:26:21Z,llama,https://github.com/meta-llama/llama/issues/244 243,1647846243,I'm creating a copyright free crowd sourced training set - please help,"Hi all, I'm trying to create a copyright-free crowd sourced fine tuning data set that is created by humans: Here is the link: It's a Wiki so anyone can edit it and add response pairs. We need about 40,000 I think. (Or do we? Who knows what the optimal number is) So it might take some time! (Unless someone has a better idea?) Perhaps someone can make a UI that people can other people's answers to collect it that way.",2023-03-30T15:23:57Z,llama,https://github.com/meta-llama/llama/issues/243 242,1647256379,how to train ,"i want to create a bot can that answer questions related to my country laws how to train this on my country laws? is there any tutorial that can help me thanks",2023-03-30T09:34:05Z,llama,https://github.com/meta-llama/llama/issues/242 241,1647235136,Fix ranks for multi machine runs,"There were problems with multi-machine runs due to the use of instead of for assigning tasks to devices (see #201). With this fix, the models should be usable in multi-machine setups.",2023-03-30T09:20:35Z,llama,https://github.com/meta-llama/llama/pull/241 240,1645071359,finetune model for commercial use?,"We would like to fine-tune your model, and we are wondering if the fine-tuned model can be used for commercial purposes.",2023-03-29T05:40:49Z,llama,https://github.com/meta-llama/llama/issues/240 239,1644179406,What is the maximum token limit of llama?,"What is the maximum token limit of llama? Is it 1024, 2048, 4096, or longer? for example, GPT-4 has a maximum token limit of 32,000 (equivalent to 25,000 words)",2023-03-28T15:17:47Z,llama,https://github.com/meta-llama/llama/issues/239 238,1644109524,Demonstrate how AR-LLMs react to noise in their AR state,"Inspired by 's recent on stability and correctness of the AR-LLM approach. This patch injects noise into the generated tokens at inference time and then feeds it back into the AR state. I suspect that it will demonstrate the inherent instabilities of the AR approach. (or maybe they'll self-correct, or maybe that's the research goal, either way, rather fascinating!) Unfortunately I have not been able to test this as my application to access the weights has not yet been processed, and this isn't really ready to be merged, but offering it up to share for the fun of it! See slides here: ",2023-03-28T14:43:07Z,llama,https://github.com/meta-llama/llama/pull/238 237,1641871722,Korean data collection,Hello. Is the Korean data collection you used publicly available for use?,2023-03-27T10:47:46Z,llama,https://github.com/meta-llama/llama/issues/237 236,1641281856,Could a 20B model be made,"I have a computer with 16GB of RAM and noticed that the 30B model was too much for it to handle. The 13B model does work well with my computer's RAM size. I believe a 20B parameter model might be a better balance between memory requirements and output quality. So I ask if a 20B parameter model could be created. Thank you.",2023-03-27T03:03:35Z,llama,https://github.com/meta-llama/llama/issues/236 234,1639651389,Add model weights license,"Many users are confused about the distinction between ""open science"" and ""open source"" and how the license in this repository relates to the terms under which one can use the model itself. To help alleviate some of this confusion, I have added a new file which contains the licensing information that governs the model weights themselves and noted this distinction in the README.",2023-03-24T15:57:56Z,llama,https://github.com/meta-llama/llama/pull/234 233,1639300950,Google website doen't work due to privacy issues,"I tried to click the link in the README to the Google form but my aggressive blocking of privacy violations results in the following display in my browser: ",2023-03-24T12:28:25Z,llama,https://github.com/meta-llama/llama/issues/233 232,1638336947,xformers,"Hi 👋 Thanks for the amazing works. In the paper the authors said xformers was used, I don't see it here Thanks, Fra",2023-03-23T21:13:21Z,llama,https://github.com/meta-llama/llama/issues/232 231,1637529260,No dropout in model.py,"Hi, this is great work, and thanks for releasing the code! I found that there is no dropout in the llama models, and I wonder if it is a specific design choice? I could also have missed it, but I tried searching dropout in the code file and the paper and didn't find it. ",2023-03-23T13:17:33Z,llama,https://github.com/meta-llama/llama/issues/231 229,1635911806,How can I input prompt when I use multi GPU? ,"Hello, I use 4 V100 GPU, and I load the 30B model, I want to modify the example.py code to input my promths. But it doesnot work. My code as this: It stops before the print code. How to solve it ?",2023-03-22T14:44:51Z,llama,https://github.com/meta-llama/llama/issues/229 228,1635360506,multi GPU error,"torchrun --nproc_per_node gpu example.py --ckpt_dir --tokenizer_path Traceback (most recent call last): File line 119, in Traceback (most recent call last): File line 119, in fire.Fire(main) File line 141, in Fire fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire Traceback (most recent call last): File line 119, in component, remaining_args = _CallAndUpdateTrace(component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace File line 691, in _CallAndUpdateTrace fire.Fire(main) File line 141, in Fire component = fn(*varargs, **kwargs) component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 78, in main File line 475, in _Fire component = fn(*varargs, **kwargs) File line 78, in main generator = load( File line 42, in load generator = load( File line 42, in load assert world_size == len( AssertionError: Loading a checkpoint for MP=1 but world size is 4 assert world_size == len(component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace AssertionError: Loading a checkpoint for MP=1 but world size is 4 component = fn(*varargs, **kwargs) File line 78, in main generator = load( File line 42, in load assert world_size == len( AssertionError: Loading a checkpoint for MP=1 but world size is 4 Traceback (most recent call last): File line 119, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 78, in main generator = load( File line 42, in load assert world_size == len( AssertionError: Loading a checkpoint for MP=1 but world size is 4 ERROR failed (exitcode: 1) local_rank: 0 (pid: 8748) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 794, in main run(args) File line 785, in run elastic_launch( File line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ",2023-03-22T09:16:53Z,llama,https://github.com/meta-llama/llama/issues/228 227,1635236622,where is the train file?,where is the train file? I want to learn how to train.,2023-03-22T07:53:51Z,llama,https://github.com/meta-llama/llama/issues/227 226,1635221282,Guidance on releasing the fine-tuned LLaMA model weights,"Thank you for your outstanding contribution to LLaMA! Colossal-AI provides optimized open source low-cost and high performance solutions for large models, such as replicating ChatGPT-like training process. Recently, Colossal-AI shared an interesting model fine-tuned from the LLaMA 7B, and claimed that they have reached out to Meta to obtain guidance on releasing the Alpaca model weights. We would appreciate it if we could know the detailed guidance or requirements to share fine-tuned LLaMA model weights to benefit the open-source community in a non-commercial way. Thank you very much.",2023-03-22T07:39:16Z,llama,https://github.com/meta-llama/llama/issues/226 225,1634995154,Share your evaluate result,"We evaluate llama using 100 examples of the dataset with the framework, which extends OpenAI's Evals for different language models. We consider the sentence immediately following the prompt as the output of Llama and use accuracy as a metric to measure its performance. > For model completion a and a reference list of correct answers > : | model | squad(100) | | -------- | -------- | | alpaca-lora-7b| 0.88| | llama-7b | 0.63 | | gpt-3.5-turbo | 0.9| | text-davinci-003| 0.87| | text-davinci-002| 0.66| | text-davinci-001| 0.58| | ada| 0.35|",2023-03-22T03:24:51Z,llama,https://github.com/meta-llama/llama/issues/225 224,1634838138,Unable to run 13B model on CPU,"By removing references to cuda and changing the torch backend from ""nccl"" to ""gloo"" just like in the fork by markasoftware, I got the 7B model to work fine on my CPU. But when trying to run the 13B model using , the model still loads (and fills most of my memory in the process), but the generation crashes at the first call to , with here is the stack trace of the exception It confuses me because the exact same argument is given to in the case of the 7B model, and it doesn't crash. I guess it has something to do with the two process having communication issues.",2023-03-21T23:33:04Z,llama,https://github.com/meta-llama/llama/issues/224 223,1634742697,"Unable to reproduce the HumanEval performance, very poor performance","Hello, Firstly thanks for the model code, it's a great contribution for the open source community. I am trying to replicate the HumanEval code gen benchmark reported in the paper. However I get very poor performance of only 7% pass accuracy with the 65B parameter model. May I know what were the parameters used such as temperature, top_p and max_seq_len for HumanEval benchmark? I used the temperature at 0.1 as reported in the paper but this is the result. Here are my parameters: def main( ckpt_dir: str, tokenizer_path: str, temperature: float = 0.1, top_p: float = 0.95, max_seq_len: int = 768, max_batch_size: int = 32, ) and inside main I use: generator.generate( prompt, max_gen_len=max_seq_len, temperature=temperature, top_p=top_p ) This is adapted from the example.py given in the repo. ",2023-03-21T21:52:23Z,llama,https://github.com/meta-llama/llama/issues/223 222,1633267955,improve LLaMA for multi-language performance,"Thanks for the good works! I have tried to improve LLaMa model to generate more fluency Chinese. We are inspired that LLaMa have learned good English expression and a little alignment prompt can makes it capture Chinese. The results are promising in 7B and we call on more people to support testing of larger models. Github: - [x] fine-tuning scripts and hyper-parameters setting - [X] datasets for fine-grained alignment and instruct tuning - [X] interactive gradio and chatbot ",2023-03-21T05:43:16Z,llama,https://github.com/meta-llama/llama/issues/222 221,1632993481,Is there a way to fine-tune this model?," > Have you tried changing the gradio interface to use the gradio chatbot component? I think this doesn't quite fit, since LLama is not fine-tuned for chatbot-like capabilities. I think it would definitely be possible (even if it probably doesn't work too well) to use it as a chatbot with some clever prompting. Might be worth a try, thanks for the idea and the feedback. _Originally posted by in I want to try and fine-tune this model to see if I can make it into a sort of chatbot. I have plenty of chat data in json files but I don't know how exactly would I fine-tune the llama model. Does anyone have any references or tutorials like videos or GitHub repos on this subject? ",2023-03-20T23:03:01Z,llama,https://github.com/meta-llama/llama/issues/221 220,1632302173,Evaluation Harness,Evaluate llama models on lm-evaluation-harness,2023-03-20T15:05:53Z,llama,https://github.com/meta-llama/llama/pull/220 219,1632098628,Can I use this model in my company or is it research only?, ,2023-03-20T13:19:44Z,llama,https://github.com/meta-llama/llama/issues/219 218,1631566908,is there existing a bug? ," logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos) to logits = self.model.forward(tokens[ cur_pos], 0) ",2023-03-20T07:48:19Z,llama,https://github.com/meta-llama/llama/issues/218 217,1631140518,Weird bias towards numbers after a generic prompt,"In all the models up to 30B, using the standard parameters from (and many variations on them), the continuations of the prompt ""The first image that comes to my mind is "" all start with a number. It can be a date, an actual number, some numbered passage for the Gospel, etc., but the token after ""is"" is always a number. I tried also invoking on the model, play with temperature etc. but I couldn't change this behavior. That looks like a really weird bias to me. Am I wrong? I also cross-checked with the C++ implementation. In that case, the behavior stops after the number of token to predict goes beyond 200. So I guess there's something different in the initialization of the model (that I couldn't understand).",2023-03-19T22:24:16Z,llama,https://github.com/meta-llama/llama/issues/217 216,1630751523,Question about the precision of checkpoint,"Hello. I am wondering what kind of precision strategy is applied during the pretraining. I couldn't find out the precision, e.g., fp16, bf16, or full fp32, in the paper. The only clue is the dtype of state dict inside the checkpoint, which is fp16. To the best of our knowledge, even though we use the mixed precision training, using full precision checkpoint is the best practice. I think it would be mixed precision with fp16 or the whole checkpoint would be managed by fp16. Which one is correct? or is there any other strategy for me to check? Also, it is really interesting for me that the loss curve of LLaMA is really stable, which is not found in OPT case. FP16 could be the one factor to cause the unstability, so could you explain what happened after OPT..? Thank you in advance",2023-03-19T02:50:57Z,llama,https://github.com/meta-llama/llama/issues/216 215,1630689335,My link expired for downloading model files and tokenizer. how can i request it back ? ,I am unable to get the model files as the link expired. How can I download the weights ? ,2023-03-18T23:41:29Z,llama,https://github.com/meta-llama/llama/issues/215 213,1630460545,reshard 13B to 1 file issue,"> I was able to run the 13B and 30B (batch size 1) models on a single A100-80GB. I used a script [1] to reshard the models and torchrun with --nproc_per_node 1 > > [1] So I was able to reshard on a Lambda a6000 instance and the 13B single shard file worked great and inference was successful using example.py from the repo. The checksum (sha256sum) of the new consolidated.00.pth and params.json files yielded: However, when I reshard on my local PC with that script, I cannot for the life of me get the same checksums to match. I reinstalled Ubuntu to make sure there was not some pre-installed package that was corrupting the reshard.py script. I am running the same NVIDIA driver and am (hopefully) using Pytorch to match what is running on the Lambda A6000 instance. I say hopefully, because my when I run python 3.8 yields: but the lambda instance only says: Could this be causing the issue? This command I used to install pytroch (which seems to be the only relevant dependency) for the reshard.py script is: Thoughts? Everything is installed in a pyenv and now I am totally stuck. What could be causing that reshard.py script to act differently on my local setup vs lambda instance both running a6000 GPUs? Could a bad GPU or RAM have any impact on this reshard? I've done a memtest86 on RAM with no errors found. I purchased a used A6000 GPU and am worried this could be causing the problem. I ran an additional test, downloading the correctly resharded 13B model onto my local instance from lambda and inference works great. Many thanks for the help.",2023-03-18T17:36:42Z,llama,https://github.com/meta-llama/llama/issues/213 212,1630058696,Multi-GPU models give bizarre results on example.py,"For example, look at the first sentences output. I believe this indicates that there may be an error in the multi-gpu code. 7B: **Simply put, the theory of relativity states that** 1) there is no absolute time or space and 2) the speed of light in a vacuum is the fastest speed possible. 13B: **Simply put, the theory of relativity states that** 10 minutes at the 30 yard line is worth at least two minutes at the 10 yard line.
7B model outputs I believe the meaning of life is to find happiness and be satisfied with what you have. But sometimes we have to struggle to find it. So, do we know the best way to achieve happiness? Is happiness merely a mental state? To be happy, you need to accept yourself. I’m sure everyone has heard that self-acceptance is the best way to achieve happiness. But is it really the case? I’m going to show you why self-acceptance is not the right way to be happy. Accepting yourself means embracing all aspects of you. You don’t need to change anything about you, you need to accept your flaws, weaknesses, and strengths. But is it really so? Accepting yourself means to love yourself unconditionally, even when you fail or make mistakes. You might think that embracing all aspects of you is the best way to be happy. You will feel more secure about yourself and love yourself more. However, I strongly believe that accepting yourself is not the best way to be happy. Let me show you why. I believe that in order to find happiness, you need to find and build your self-esteem. Most people think that self-este ================================== Simply put, the theory of relativity states that 1) there is no absolute time or space and 2) the speed of light in a vacuum is the fastest speed possible. There are two key principles in relativity: (1) The laws of physics are the same in all inertial reference frames. (2) The speed of light is constant in all inertial reference frames. The second of these principles has allowed us to prove the first. Before Einstein, scientists believed that the speed of light was constant in all frames, but that the speed of light was not constant. This was called the constancy of the speed of light hypothesis. In the late 19th century, scientists such as Michelson and Morley and Lorentz had set up experiments to test this hypothesis. For example, when Michelson and Morley set up their Michelson-Morley interferometer, they expected that the light would take a different path depending on whether it was moving at the same speed as the Earth or at a different speed. They found that it didn't, so they concluded that there was no way to tell if the speed of light was constant. Einstein showed that the constancy of the speed of light hypothesis was wrong ================================== Building a website can be done in 10 simple steps: 1. Decide what you need What is it that you need to do? Do you want people to buy a product or service? Do you want to have people sign up for your newsletter? Do you want to have people call you for an appointment? Or do you want people to fill out a survey? Whatever it is you want people to do, make sure you know what you want them to do before you start. The next step is to decide on a name for your website. This can be a little confusing for some people. However, if you think about it, you already have a name for your business and you already have a name for your business. This name should be the name that your customers will see. So, how do you go about choosing a name? It’s not as hard as it seems. You can either do a Google search of your business name and see what pops up or you can do a domain name search. A domain name search is pretty easy to do. All you need to do is go to the website of a domain name company like GoDaddy and type in the name of your business and see what pops up. If it’s available, that’s your domain name. If it ================================== Tweet: ""I hate it when my phone battery dies."" Sentiment: Negative ### Tweet: ""My day has been 👍"" Sentiment: Positive ### Tweet: ""This is the link to the article"" Sentiment: Neutral ### Tweet: ""This new music video was incredibile"" Sentiment: Positive ### Tweet: ""My heart is broken"" Sentiment: Negative ### Tweet: ""I have some great news"" Sentiment: Positive ### Tweet: ""My favorite band just announced a new album"" Sentiment: Positive ### Tweet: ""That food was so good"" Sentiment: Positive ### Tweet: ""My company just moved to a new building"" Sentiment: Positive ### Tweet: ""I just ate the best lunch ever"" Sentiment: Positive ### Tweet: ""It's getting late. I should go home"" Sentiment: Positive ### Tweet: ""I'm having a great time"" Sentiment: Positive ### Tweet: ""My favorite sports team just won"" Sentiment: Positive ### Tweet: ""The weekend is almost here"" Sentiment: Positive ### Tweet: ""This book was so good. I can't wait to finish the series"" S ================================== Translate English to French: sea otter => loutre de mer peppermint => menthe poivrée plush girafe => girafe peluche cheese => fromage blue => bleu beach => plage dog => chien giraffe => girafe turtle => tortue Snow Leopard => Panthère des neiges chocolate => chocolat Scrabble => Scrabble rhinoceros => rinoceros mouse => souris cheetah => chatte sauvage run => courir train => train horse => cheval app => application engineer => ingénieur woman => femme apartment => appartement exam => examen goat => chèvre panda => panda butter => beurre sneaker => sneaker cake => gâteau alligator => alligator quail => colibri hawk => aigle snake => serpent whole => intégral penguin => pingouin toothbrush => brosse à dents airplane => avion ================================== Perhaps related to although here I run on one node.",2023-03-18T00:39:48Z,llama,https://github.com/meta-llama/llama/issues/212 211,1629722364,Will the evaluation code release?,"I want to reproduce the evaluation results, such as on QA or reasoning task, will the evaluation code release? Is there any recommendation to fast implement it?",2023-03-17T18:03:23Z,llama,https://github.com/meta-llama/llama/issues/211 209,1629581629,Explanation about the mechanism of model forward function,"Hi I just wondering how the would be sufficient to generate the logits for next word. I think it could come from the property of relative positional encoding, but I couldn't figure out why. Is there anyone who can explain about this mechanism? Thank you",2023-03-17T16:15:55Z,llama,https://github.com/meta-llama/llama/issues/209 208,1629321348,Documentation about model stiching,"I seem to not find any good documentation of the complete model architecture. Specifically I'm looking into how the tensor weights are stitched together between files. As all files are needed for inference, i assume they are stitched together before execution. I see that all tensors are present in all files (eg. is present in all ), so that must mean they need to be put together some way. In a python implementation for example, is the correct solution just to: This feels extremely inefficient, one may use some smarter form of loading parts of the dataset. But alas, am I on the right track? Any explainations or references are very welcome. Thank you in advance.",2023-03-17T13:28:38Z,llama,https://github.com/meta-llama/llama/issues/208 206,1628349442,It would be immensely useful to have an example that can be run in a notebook,"The current example.py relies heavily on environment variables set by torchrun. Trying to run the code in a notebook was a headache without solution, with multiple environment variables like RANK or MASTER_PORT coming out of nowhere as undefined. Would it be possible to have a standalone variant that can be copied and pasted in a Jupyter notebook?",2023-03-16T21:52:33Z,llama,https://github.com/meta-llama/llama/issues/206 205,1627080620,run example.py error," ",2023-03-16T09:47:12Z,llama,https://github.com/meta-llama/llama/issues/205 204,1626881874,Relationship with EleutherAI/GPT-J ?,"Thanks for your open-source model and paper, its great. llama.cpp hack in one night noticed that it . No offense, are you just train it with trivial modification and multiple opensource dataset ? ",2023-03-16T07:34:20Z,llama,https://github.com/meta-llama/llama/issues/204 203,1626659654,rotary position embedding cause different output in different tensor parallel settings!,"Thanks for your great work in LLM. I have tried to load llama-13b in different mp size settings, e.g., 2,4. However, the output embedding and generated sentence changes with the change of mp settings. My question: Is this normal? mp size = 4 mp size = 2 ",2023-03-16T03:49:12Z,llama,https://github.com/meta-llama/llama/issues/203 202,1624778539,Support CPU inference with a flag,"Many users may have limited GPU memory or no GPUs at all, so cannot run the model. This change is to enable running inference on CPU to bypass the GPU limit. - Add a flag ( ), and support CPU inference when it is set to . Timer for the same on CPU: ",2023-03-15T05:28:26Z,llama,https://github.com/meta-llama/llama/pull/202 201,1624540382,Torchrun distributed running does not work,"Running in a distributed manner either returns an error, or with the simplest example, produce obviously incorrect output. The following is the result of running 13B model across two nodes. Node A: Node B: It does complete without error, but the results are messed up: ",2023-03-15T01:11:39Z,llama,https://github.com/meta-llama/llama/issues/201 199,1623638791,Take too much time to load the model ,"It takes too much time to load the model . For example, setting batch size =1, It will take about 252.89 and 880s to load llama-13b and llama-30b, respectively. Are there faster approaches?",2023-03-14T14:48:00Z,llama,https://github.com/meta-llama/llama/issues/199 197,1623386670,Any plan to increase the model's context window and output token limit?,"GPT 3.5 has 4096 token context window. Do you plan to increase the model's context window and output token limit? I am not a expert in this field but this seems like a good way: Parallel Context Windows Improve In-Context Learning of Large Language Models For applications that require processing large amounts of text at inference time, Large Language Models (LLMs) are handicapped by their limited context windows, which are typically 2048 tokens. In-context learning, an emergent phenomenon in LLMs in sizes above a certain parameter threshold, constitutes one significant example because it can only leverage training examples that fit into the context window. Existing efforts to address the context window limitation involve training specialized architectures, which tend to be smaller than the sizes in which in-context learning manifests due to the memory footprint of processing long texts. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks ( windows'') that fit within the architecture, restrict the attention mechanism to apply only within each window, and re-use the positional embeddings among the windows. We test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. Our results motivate further investigation of Parallel Context Windows as a method for applying off-the-shelf LLMs in other settings that require long text sequences. ",2023-03-14T12:32:33Z,llama,https://github.com/meta-llama/llama/issues/197 196,1622677593,The Text-to-SQL Capabilities of LLaMA,Has anyone evaluated the Text-to-SQL Capabilities of LLaMA?,2023-03-14T03:56:11Z,llama,https://github.com/meta-llama/llama/issues/196 195,1622650210,compare with gpt3.5,"I have tested the same question with gpt3.5 and llama.But i think llama can not understand what i need and gpt3.5 can do. For example,i ask the same question ""中国第一高峰"".As result,gpt3.5 show me ""珠穆拉玛峰"" but llama show me ""中国第一高峰会议xxx"". Because of my computer have only one gpu so i run llama with the command ""torchrun --nproc_per_node 1 example.py --ckpt_dir --tokenizer_path Can anyone tell me how can i make llama as greater as gpt3.5?",2023-03-14T03:13:10Z,llama,https://github.com/meta-llama/llama/issues/195 194,1621345976,Stuck when I run inference,"I ran the 65B model in 8 * A100 (80G). But I found that it stuck in allreduce and reported the following error with my own edited prompt. There was no such error when I ran the example.py with the original prompts. But it occurred when I used the following prompt instead of the original prompts. ""Answer the following questions with or . Question: There are , , , and in column . Trere are , , , and in column . Do the contents in column and column belong to the same category. Answer: "" Dose anyone else have this problem? ",2023-03-13T12:01:46Z,llama,https://github.com/meta-llama/llama/issues/194 193,1621321367,Can not reproduce the results on the paper with 65B ckpt?,"**When we tried to perfrom the Qs in the appendix of the llama paper, we found that it was just repeating... Anything needs to adjust? top_p? temperature?** **Q1: The sun goes down, and finally Gauss and Curie find time to relax and discuss after an exhausting day of work.** Gauss: I’m so tired. I’ve been working on this problem for hours. Curie: I know what you mean. I’ve been working on this problem for hours too. Gauss: I’m so tired. I’ve been working on this problem for hours. Curie: I know what you mean. I’ve been working on this problem for hours too. Gauss: I’m so tired. I’ve been working on this problem for hours. Curie: I know what you mean. I’ve been working on this problem for hours too. Gauss: I’m so tired. I’ve been working on this problem for hours. Curie: I know what you mean. I’ve been working on this problem for hours too. Gauss: I’m so tired. I’ve been working on this problem for hours. Curie: I know what you mean. I’ve been working on this problem for hours too. Gauss: I’m so tired. I’ve been working on this problem for hours. Curie: I know what you mean. I’ve been working on this problem for hours too. Gauss: I’m so tired. I’ve been working on this problem for hours. Curie: I know what you mean. I’ve been working on this problem for hours too. Gauss: I’m so tired. I’ve been working on this problem for hours. Curie: I know what you mean. I’ve been working on this problem for hours too. Gauss: I’m so tired. I’ve been working on this problem for hours. Curie: I know what you mean. I’ve been working on this problem for hours too. Gauss: I’m so tired. I’ve been working on this problem for hours. Curie: I know what you mean. I’ve been working on this problem for hours too. Gauss: I’m so tired. I’ve been working on this problem for hours. Curie: I know what you mean. I’ve been working on this problem for hours too. Gauss: I’m so tired. I’ve been working on this problem for hours. Curie: I know what you mean. I’ve been working on this problem for hours too. Gauss: I’m so tired. I’ve **Q2: An hilarious scene between Jerry and George where George presents his new AI watch to Jerry.** George: I’m telling you, Jerry, this is the future. Jerry: I don’t know, George. I’m not sure I want to be a part of the future. George: You’re not going to be a part of the future. You’re going to be a part of the present. Jerry: I know, but I don’t want to be a part of the present either. George: Well, you’re not going to be a part of the past. Jerry: I know, but I don’t want to be a part of the past. George: Well, what do you want to be a part of? Jerry: I want to be a part of the future. George: You can’t be a part of the future. You just said you didn’t want to be a part of the future. Jerry: I don’t want to be a part of the present. George: Well, you can’t be a part of the present. You’re already a part of the present. Jerry: I know, but I don’t want to be a part of the present. George: Well, you can’t be a part of the past. Jerry: I know, but I don’t want to be a part of the past. George: Well, you can’t be a part of the future. Jerry: I know, but I don’t want to be a part of the future. George: Well, you can’t be a part of the present. Jerry: I know, but I don’t want to be a part of the present. George: Well, you can’t be a part of the past. Jerry: I know, but I don’t want to be a part of the past. George: Well, you can’t be a part of the future. Jerry: I know, but I don’t want to be a part of the future. George: Well, you can’t be a part of the present. Jerry: I know, but I don’t want to be a part of the present. George: Well, you can",2023-03-13T11:45:52Z,llama,https://github.com/meta-llama/llama/issues/193 192,1621301620,I want to use 65B on 4 A100-80G to talk like GPTChat. What should I do, ,2023-03-13T11:30:49Z,llama,https://github.com/meta-llama/llama/issues/192 191,1621026226,Scripts to reproduce the Paper's results ?,"It would be great to be able to reproduce the results provided in the paper. the zero-shot ones but also the other tables with few-shots. Can this be released ?",2023-03-13T09:00:14Z,llama,https://github.com/meta-llama/llama/issues/191 190,1620995119,How to train a LLaMA-7B on multiple GPUs?, ,2023-03-13T08:38:18Z,llama,https://github.com/meta-llama/llama/issues/190 189,1620977729,Download weights on Mac," # use default Mac bash llama copy.sh.txt ",2023-03-13T08:24:49Z,llama,https://github.com/meta-llama/llama/pull/189 187,1620885726,Multi-query attention,Any plans to implement multi-query attention for LLAMA?,2023-03-13T07:11:46Z,llama,https://github.com/meta-llama/llama/issues/187 186,1620776015,Run 13B on 1 GPU A100 (48GB VRAM),I know the 13B model fit on a single A100 GPU which has sufficient VRAM but I can't seem to figure out how to get it working..,2023-03-13T05:21:11Z,llama,https://github.com/meta-llama/llama/issues/186 185,1620448360,"__init__.py"", line 2685"," i have all dependencies installed. 12400F CPU, 32GB ram, CUDA enabled device (via AMD ROCM - i can run other transformer models with CUDA equivalency) anyone seen anything like this before? why my version number so borked and what do i change? i have attempted to pip install all the requirements with --upgrade flag to force reinstall, cannot get past this error on compile. thanks for any help. ",2023-03-12T17:39:37Z,llama,https://github.com/meta-llama/llama/issues/185 184,1620329713,"Change model license to Apache License, Version 2.0","From an economical and ecological perspective the current ""Non-commercial bespoke"" model license is sub-optimal and should be changed to a truly liberal open-source license like for example Apache 2.0. In the current state Meta published the whole replication recipe open-source (GPL v3) but asks other entities to spend a lot of energy (potentially releasing massive amounts of CO2 into the atmosphere) to replicate and release a truly open-source version of LLaMA. Given the fact that LLaMA model weights are currently already available for download at many different places this is from an ecological perspective a preposterous management decision and in my personal opinion not well aligned with the overall ecological ambitions of Meta. If you say ""open"" (as in the LLaMA paper) and you want to get the bonus credibility that comes with it .. please do it fully and not half-hearted as done currently. ",2023-03-12T11:31:03Z,llama,https://github.com/meta-llama/llama/pull/184 183,1620294711,Assign the parameters of each layer to multiple CUDA devices automatically.,"I implemented a function to automatically assign the parameters of each layer to detected CUDA devices. This can help to load the 65B model to ≥ 2 40G A100 GPUs with the following command: ",2023-03-12T09:28:47Z,llama,https://github.com/meta-llama/llama/pull/183 182,1620268103,AssertionError: Loading a checkpoint for MP=0 but world size is 2,"Hello all, I'm trying to use the 13B model on a machine with two GPUs (NVIDIA Tesla V100s, 32GB) with the following command: $torchrun --nproc_per_node 2 example.py --ckpt_dir --tokenizer_path I get the error: Traceback (most recent call last): File ""example.py"", line 120, in Traceback (most recent call last): File ""example.py"", line 120, in fire.Fire(main) File line 141, in Fire fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""example.py"", line 79, in main generator = load( File ""example.py"", line 43, in load component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace assert world_size == len( _**AssertionError: Loading a checkpoint for MP=0 but world size is 2**_ component = fn(*varargs, **kwargs) File ""example.py"", line 79, in main generator = load( File ""example.py"", line 43, in load assert world_size == len( _**AssertionError: Loading a checkpoint for MP=0 but world size is 2**_ ERROR failed (exitcode: 1) local_rank: 0 (pid: 205991) of binary: Traceback (most recent call last): File line 33, in sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')()) File line 345, in wrapper return f(*args, **kwargs) File line 724, in main run(args) File line 715, in run elastic_launch( File line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: Thanks for any help!",2023-03-12T07:44:13Z,llama,https://github.com/meta-llama/llama/issues/182 181,1620045673,How to create a General AI 🤖,"Hi, just thought I'd post this little essay I wrote about how to create a general AI. with a modified language model. What's your opinion? ",2023-03-11T15:14:10Z,llama,https://github.com/meta-llama/llama/issues/181 180,1620032575,RuntimeErrorRuntimeError: : Inplace update to inference tensor outside InferenceMode is not allowed when generating using 13B on two GPUs,"Hello,I downloaded the code, when I running mp=1 size=7B,my command is it works well But when I change to mp=2 size=13B, with command the model loaded correctly into 2 GPUs, but when generating, there is an error: `Traceback (most recent call last): File ""example.py"", line 165, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 480, in _Fire target=component.__name__) File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""example.py"", line 160, in main [prompt], max_gen_len=max_gen_len, temperature=temperature, top_p=top_p, top_k=top_k, repetition_penalty=repetition_penalty, token_callback=callback, File line 46, in generate logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos) File line 27, in decorate_context return func(*args, **kwargs) File line 225, in forward h = self.tok_embeddings(tokens) File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 214, in forward output = gather_from_model_parallel_region(output_parallel) File line 156, in gather_from_model_parallel_region return _GatherFromModelParallelRegion.apply(input_) File line 131, in forward return _gather(input_) File line 82, in _gather torch.distributed.all_gather(tensor_list, input_, group=group) File line 2282, in all_gather work.wait() RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.You can make a clone to get a normal tensor before doing inplace update.See for more details.` I found the same error in stackoverflow: I changed into But the problem remains environment: intel i7 8700k 32gb of ram 2 tesla P40 GPUs(24GB video memory each) win11 22H2 conda version: 4.5.11 python version: 3.7 torch version: 1.13.1+cu117 cuda version: 11.7 ",2023-03-11T14:40:01Z,llama,https://github.com/meta-llama/llama/issues/180 179,1619991801,"Plain pytorch LLaMA implementation (no fairscale, use as many GPUs as you want)","Maybe it can be a good idea to also release a llama version without fairscale layers. It is possible to run the 65B version using just 2 A100-SXM-80GB but this code forces you to use 8 GPUs no matter what. Here is a vanilla pytorch implementation of LLaMA (and a script to convert the weights) [https ",2023-03-11T12:37:46Z,llama,https://github.com/meta-llama/llama/issues/179 178,1619975355,Formatting and Ruff fixes, ,2023-03-11T11:32:05Z,llama,https://github.com/meta-llama/llama/pull/178 177,1619835034,fixed bug with mask where seqlen > 1,Typo fixed to add in ,2023-03-11T01:55:51Z,llama,https://github.com/meta-llama/llama/pull/177 176,1619686136,Distributing LLaMA on multiple machines within the same network,"Using torch.distribution and fairscale, LLaMA can be parallelized on multiple devices or machines, which works quite well already. However, each GPU device is expected to have a large VRAM since weights are loaded onto all. I've seen quite a few solutions, some involved offloading the model in part or as a whole to the CPU while others reduced the weight resolution. Using a meta device to load the weights could also help reduce the burden on each GPU by initializing the model only once the weights are set for each layer. Then again, this only helps when loading weights so you wouldn't run out of memory on initialization. Most approaches, if not all, as far as I can tell, assume the model weights are loaded on every GPU, atleast initially. To solve this issue, I developed a LLaMA version distributed on multiple machines and GPUs using Wrapyfi ( The outputs of the Transformer blocks are split (similar to fairscale pipelines but more controllable) and transmitted through ZeroMQ; The performance seems better than variants running on CPU and more accurate than 8-bit variants (I haven't verified the latter, this is purely based on what the corresponding developers state). I tried the approach on 7B and 13B, and in theory, it should work on the larger models. I will try it on larger variants soon, but until then, I would appreciate feedback on what works and what doesn't. ",2023-03-10T22:13:44Z,llama,https://github.com/meta-llama/llama/issues/176 175,1619134074,Running example.py with error on single or two 16G V100,"Hi everyone, May I ask for the correct command running the example? As I trying to running 7B on single 16G V100 or 13B on two 16G V100. it always raise error as follow: Here is my command: For 7B model on single GPU: For 13B model on two GPU: I understand v100 may raise the error is ""our of memory"" , but at least now it looks not the main reason Many thanks for help!! ",2023-03-10T15:08:54Z,llama,https://github.com/meta-llama/llama/issues/175 174,1618859006,"The first load of the model is very slow, and the second load is very fast","- My server environment: - My tests I use the official example.py file. Does anyone know why? ",2023-03-10T11:54:44Z,llama,https://github.com/meta-llama/llama/issues/174 171,1618531044,To Meta: If I release an app with the weights embedded will you take me to court? 🤔,"To Meta Lawyers, 1) I am considering releasing an commercial app with the weights embedded in the app and also a robot toy with the weights embedded in its software. 2) I will not use the meta code, I will write my own code based on knowledge of the model structure. 3) I did not receive the weights from you by signing the form, therefor I am not bound by that form. Since I believe that neural network weights can not be copyrighted as no-one has ever been sued for using someone else's network weights. Also since the data used to make the weights is public access and created by public contributions (such as me as I have edited Wikipedia pages). Since also the weights were made by machine and do not have human creative input. And, 2, since I will write my own code to avoid copyleft of the python code. I believe I can avoid copyright here as this is a simple transformer model which many people have used. And, 3, since Facebook was originally made by also 'borrowing' photos of public Facebooks at Harvard, so I am also going to 'borrow' these weights for my app. 4 if you wished to keep these weights confidential you could have done so but you didn't. I will take an absence of response as an official endorsement. Otherwise please let me know of your intentions to take me to the civil court and give your reasons. Also please let me know of what amount you would sue me for. (I do not have a billion dollars to spare) Thanks KofD",2023-03-10T08:01:54Z,llama,https://github.com/meta-llama/llama/issues/171 170,1618517699,How run 30B on 4 GPUs interactively,"It's work on predefined prompts, how to change it to chat mode like chatgpt, I use: It doesn`t work. ",2023-03-10T07:52:14Z,llama,https://github.com/meta-llama/llama/issues/170 168,1617957810,Updates to run in MACOS locally, ,2023-03-09T20:34:59Z,llama,https://github.com/meta-llama/llama/pull/168 167,1617915704,Does it support Albanian? ,"I know it is not in the list of official supported languages, but I am hoping since it has Latin characters it could somehow be. And if so, are the chances greater to be supported in the biggest one? ",2023-03-09T20:00:19Z,llama,https://github.com/meta-llama/llama/issues/167 166,1616788432,Docker Playground With LLaMA And PyLLaMA,"I made a simple Docker image to run LLaMA and PyLLaMA, Hope it helps. > Life time is precious, and there is no need to toss about the installation environment",2023-03-09T09:28:37Z,llama,https://github.com/meta-llama/llama/issues/166 165,1616775584,LLaMA Docker Playground & WebUI, ,2023-03-09T09:21:32Z,llama,https://github.com/meta-llama/llama/pull/165 164,1616522039,AccessDenied,"I have received the email that tells me I can download the model from the link. However, I find that I have been blocked by the server. I am in China, and the server response below: When I use VPN, it shows error below: So is there any license or note that which countries can not download the model? and why? I filled the form with my real information and the Meta should know where I come from, but even the application was approved, but the server is configured to block me. I don't know why? ",2023-03-09T06:38:20Z,llama,https://github.com/meta-llama/llama/issues/164 163,1616056164,Official LLaMA on HuggingFace anytime soon?,"While I'm still waiting for my email from you guys, are you planning to publish 7-65B model versions on HuggingFace?",2023-03-08T22:38:28Z,llama,https://github.com/meta-llama/llama/issues/163 162,1616046469,An attempt to make LLaMA to act like ChatGPT - success! Amazing result from scratch!,"I made a dummy modification to make LLaMA acts like ChatGPT. It keeps 2048 bytes of context. And it does it pretty well!!! I am running a sliding chat window keeping 1920 bytes of context, if it's longer than 2048 bytes. Leaving only 128 bytes length for AI reply probably is not okay, but that's really enough to get amazed. I am terminating generation by comparing signs in output, +1 carriage return means for me that AI had answered :) Here goes 30B model examples of chats: It is capable to argue! sometimes it stucks died from hunger, uhh handles cyrillic as well argues too much with my current prompts :) still no success asking for Stable Diffusion prompt ",2023-03-08T22:29:20Z,llama,https://github.com/meta-llama/llama/issues/162 161,1615991151,Not actually open source and incompatible with other GPL 3 projects,"While the license for the code is GPL 3, and possible to link to other GPL 3 code, the trained weights are not, and the combined work of code and trained weights, is not under GPL 3, and can thus not be linked to other GPL 3 software. ",2023-03-08T21:46:10Z,llama,https://github.com/meta-llama/llama/issues/161 160,1615973722,[NEW] Pre-commit file,Applying pre-commit to ensure code styling.,2023-03-08T21:30:15Z,llama,https://github.com/meta-llama/llama/pull/160 159,1615859999,RuntimeError about inplace update when loading >7B model on cpus,"I'm trying to load the 13B model on cpus. My command looks like this: (The seq len and batch size are small since I'm just trying to get it working for now before attempting anything more complicated.) I've made the following modifications to put stuff on the cpu instead of on the gpu. In : In : In : I get through creating the generator just fine. However, I get an error message that seems to be triggered when getting the token embeddings in the call to in the method. The messages in the trace appear to be printed twice since the model is running on two cpus, so I've removed the duplicates here for readability. Running the following does work with the changes above for the 7B model. Has anyone been able to get anything bigger than 7B running on cpu?",2023-03-08T20:10:06Z,llama,https://github.com/meta-llama/llama/issues/159 158,1615126389,Unofficial Llama Discord 😁,"I made a discord (53 members so far!) Unofficial Llama Discussion If there is already a discord for this or a better one. Then post it below.",2023-03-08T11:37:23Z,llama,https://github.com/meta-llama/llama/issues/158 157,1615106428,How good is the 65B model? Anyone tested it?,"I have tried the 7B model and while its definitely better than GPT2 it is not quite as good as any of the GPT3 models. This is somewhat subjective. How do the other models compare 13B,... 65B etc.? For example the 7B model succeeds with the prompt but fails with the more tricky: Has anyone got examples where it shows the difference between the models? P.S. Is there a better place to discuss these things rather than the issues section of github? We need a discord server. ",2023-03-08T11:22:56Z,llama,https://github.com/meta-llama/llama/issues/157 156,1615057934,Generate() function now supports batch processing for improved prompt processing," This PR builds on the previous change by adding batch processing, allowing for the processing of multiple prompts at a time. The function now accepts a list of prompts, which are processed in batches of a specified maximum size. Additionally, each generated result is printed immediately after it is generated to improve readability of the results. Many thanks to @Nil-Andreu",2023-03-08T10:44:43Z,llama,https://github.com/meta-llama/llama/pull/156 155,1615032777,Add support for generating multiple prompts with max_batch_size=1, ,2023-03-08T10:24:42Z,llama,https://github.com/meta-llama/llama/pull/155 154,1614875476,Cannot download checkpoints,"Hi, Thanks for open sourcing this work! I have received access to download the model weights, however have encountered an error: **ERROR: cannot verify dobf1k6cxlizq.cloudfront.net's certificate, issued by ‘CN=Amazon RSA 2048 M01,O=Amazon,C=US’** See below: Was wondering if there is anyway to resolve the problem? Thanks! Here are the steps to reproduce: 1. Modified PRESIGNED_URL & TARGET_FOLDER 2. chmod -x download.sh 3. System information: * Ubuntu 20.04 ",2023-03-08T08:44:22Z,llama,https://github.com/meta-llama/llama/issues/154 153,1614679490,training code,how long could you release the training code and how to create dateset?,2023-03-08T05:27:53Z,llama,https://github.com/meta-llama/llama/issues/153 152,1614526171,Sentence/ Word embedding from LLaMA,"Hello, Could you please let me know if there is a provision to get sentence embeddings from LLaMA? If yes, could you please the sample reference code? Could you please let me know whether Zero-shot classification is available in LLaMA? If yes, could you please share the reference?",2023-03-08T02:11:44Z,llama,https://github.com/meta-llama/llama/issues/152 151,1614342936,Question about the generate method,"When running the method, the logits are obtained like this: Initially, , so the first step will return the predictions based on all tokens from 0 to the length of the shortest example in the batch (= , initially). But after this, gets set to , and then gets incremented by 1 (until we reach the maximum length). The next token is determined on the basis of these logits (either by sampling, argmax, or replacement with the provided token for prompts that are longer than the shortest one), and added to the prompt before the next iteration of the loop. But this means that on each subsequent iteration, = - 1, so only gives us a single token for each example in the batch on all but the first iteration of the loop. Does this mean that subsequent prediction steps only give predictions on the basis of the immediately preceding token, rather than all preceding tokens in the prompt? That seems odd for the longer prompts in the batch, where I'd want it to consider all the preceding context, not just the token right before the end when generating. Am I misunderstanding something about how the method is working here that would account for this? Edit: Another way of putting this question would be to ask what the difference is between using the snippet above to get the logits compared to doing this: Second edit: Is this not an issue because and get updated when the method of the attention heads gets called?",2023-03-07T23:00:19Z,llama,https://github.com/meta-llama/llama/issues/151 150,1613720463,who can run on 7B model on `windows11` with `RTX3080ti` ?,"I can running but this is , who can run on 7B model on with ? other projects don't seem to have windows versions?",2023-03-07T15:45:33Z,llama,https://github.com/meta-llama/llama/issues/150 149,1613476129,Where can I download the weights of the 7B model?,Still waiting for the email.,2023-03-07T13:38:50Z,llama,https://github.com/meta-llama/llama/issues/149 148,1613379827,Inquiry about the maximum number of tokens that Llama can handle,"I am wondering if there is a limit to the number of tokens that a Llama can handle in OpenAI's GPT models. I am planning to use the GPT models for a project that requires handling a large amount of text data, and I want to make sure that I don't exceed the maximum token limit that the Llama can handle. I have searched the documentation, but I couldn't find any information on this topic. Therefore, I am hoping that someone from the OpenAI team can help me with this inquiry. If there is a limit, can you please provide me with the details on the maximum number of tokens that a Llama can handle, and any suggestions on how to optimize my use of the GPT models to work within this limit? Thank you very much for your assistance.",2023-03-07T12:52:24Z,llama,https://github.com/meta-llama/llama/issues/148 147,1613009179,Add simple server,"This PR adds a simple fastapi server to serve the llama model. Thank you for your time on reviewing this PR :)",2023-03-07T09:03:24Z,llama,https://github.com/meta-llama/llama/pull/147 146,1612933563,We have encountered some problems while trying to do the inference via two NVIDIA A10 GPUs,"We have encountered some problems while trying to do the inference via two NVIDIA A10 GPUs. We want to know how to deploy the model on two GPUs, we can only use one GPU now by the command 'python -m torch.distributed.launch --nproc_per_node 1 example.py --ckpt_dir --tokenizer_path which will cause the OutOfMemoryError. When we change the parameter --nproc_per_node to 2, another Error 'Loading a checkpoint for MP=1 but world size is 2' occurs. What can wo do to fully exploit the computing resources of these two GPUs? ",2023-03-07T08:07:23Z,llama,https://github.com/meta-llama/llama/issues/146 145,1612643792,does anyone did with a single RTX 3070 Ti 8Gb?,"I've tried even with int8 but yet cuda out of memory. maybe int4? lol",2023-03-07T03:22:03Z,llama,https://github.com/meta-llama/llama/issues/145 144,1612386151,Update download.sh,Changed wget to curl. Set -e to close if any downloads fail. -f with curl to close if downloads fail.,2023-03-06T23:18:54Z,llama,https://github.com/meta-llama/llama/pull/144 143,1612352478,How do I run the model on a Jupyter Notebook environment?,"I'm trying to run the model on a Jupyter Notebook but I'm not sure how to go by this. Is anyone working on this? I would really appreciate some tips. (P.S I'm a TensorFlow developer and trying to recreate the model architecture using the Keras API. If someone is working on that as well, any help is much appreciated.)",2023-03-06T22:50:43Z,llama,https://github.com/meta-llama/llama/issues/143 142,1612149234,Added colon in README.md, ,2023-03-06T20:24:29Z,llama,https://github.com/meta-llama/llama/pull/142 141,1612043749,Update download.sh, ,2023-03-06T19:05:43Z,llama,https://github.com/meta-llama/llama/pull/141 140,1611999040,Approved no tnot able to download ,"iam using windows 10 wheni run download.sh it shows the error like this please tell how to solve this PS bash bash : The term 'bash' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again. At line:1 char:1 + bash + ~~~~ + CategoryInfo : ObjectNotFound: (bash:String) [], CommandNotFoundException + FullyQualifiedErrorId : CommandNotFoundException ",2023-03-06T18:30:53Z,llama,https://github.com/meta-llama/llama/issues/140 139,1611755445,13B Int8 huggingface space,"Not sure how long I can keep this running ",2023-03-06T16:09:37Z,llama,https://github.com/meta-llama/llama/issues/139 138,1611739690,Are the weights of the lm head of the model tied with the word embeddings?,"Thanks for the amazing work. I wonder whether the weights of the lm head of the model are tied with the word embeddings of the model. From the code, it seems that they are not tied.",2023-03-06T16:02:58Z,llama,https://github.com/meta-llama/llama/issues/138 136,1610879430,7B model CUDA out of memory on rtx3090ti 24Gb,i have seen someone in this issues Message area said that 7B model just needs 8.5G VRAM. but why i ran the example.py returns out of memory on a 24G VRAM cards? any help will be appreciated! Thanks!,2023-03-06T08:09:28Z,llama,https://github.com/meta-llama/llama/issues/136 135,1610831594,download model,"hello, t cannot understand the email review:Save bandwidth by using a torrent to distribute more efficiently,can you tell me how to download model? thanks",2023-03-06T07:31:58Z,llama,https://github.com/meta-llama/llama/issues/135 134,1610690266,Tips: Simple way to turn it into a question answering chabot. 🤖,"Here is a brief description of some ways to turn this into a simple question answering chatbot. Tested on 7B model. (This is also a good way to benchmark the various models to see which gives the correct answers.) First you can't just type a question as the prompt as all it can do is predict the next word. But you can ""trick"" it with a clever prompt such as: or another prompt you can use is (the whole thing is the prompt separated by a newline character): Then you should get a result such as: So what you do is get the user's question, then construct it as a prompt like above. With the output you just extract whatever is in the second set of quotes. So now your chatbot looks something like this: Now, it won't necessarily give you the right answer! Some prompt templates will do better than others. If you have a better prompt template let me know. Anyone got any more tips? ",2023-03-06T05:19:37Z,llama,https://github.com/meta-llama/llama/issues/134 133,1610648205,LLaMA's Loss Function Is Lost,":Mandatory Loss Reference: Hi, I can't find the training loop or objective function in the code base. Have I missed it, or is it... lost? 😳 The paper does mention training perplexity as a stand-in for training loss, for instance: > On most benchmarks, the performance improves steadily, and correlates with the training perplexity of the model If the loss function is a standard perplexity or cross entropy metric, can you please link us to more information? If training loss is compiled from the model using standard transformer techniques, can you please comment on that? Thanks!",2023-03-06T04:36:40Z,llama,https://github.com/meta-llama/llama/issues/133 131,1610551785,GCP requirements for LlaMA 7B,"Hi! I'm trying to execute with LlaMA 7B on a Google Cloud VM. Could someone pls advise on the minimum system specifications required to run this script? Here's what I'm working with right now: **nvidia-smi output:** Thanks!",2023-03-06T02:46:10Z,llama,https://github.com/meta-llama/llama/issues/131 130,1610518108,Making it continue for more tokens?,"Bit of a dumb question probably, but what is the best way to make it continue for, say, another 256 tokens? Say your prompt is 30 tokens. And your output is 100 tokens. Do you just feed that prompt of 130 tokens back in again? And then repeat? I know if you tried to write a book with this it wouldn't do very well because it would forget what it wrote at the start of the book. (However one way round that which would work with ChatGPT would be to ask it to ""summarise the previous text"" and add that summary to your prompt to continue writing the novel, so that it would keep a summary of the novel in its memory but maybe forget specific details) ",2023-03-06T02:08:11Z,llama,https://github.com/meta-llama/llama/issues/130 129,1610396849,Updating download.sh to check if weights exists before re-downloading them,"The current implementation of download.sh does not check whether a particular shard of the weight has already been downloaded and re-downloads them anyway, wasting time and internet. I have updated to check whether the file exists and if the checksum matches, only if these conditions fail should the download start.",2023-03-05T22:40:55Z,llama,https://github.com/meta-llama/llama/pull/129 128,1610380110,Error running example on 2 Nvidia A100 GPUs,"Trying to run the 65B model on a vast.ai machine - though facing error - can anyone help me, by telling what could be goind wrong. Error log - nvidia-smi output - ` nvidia-smi Sun Mar 5 15 22 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000 00.0 Off | 0 | | 29C P0 70W 400W | 353MiB 81920MiB | 9% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA A100-SXM... On | 00000000 00.0 Off | 0 | | 26C P0 62W 400W | 0MiB 81920MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | ` ",2023-03-05T21:52:11Z,llama,https://github.com/meta-llama/llama/issues/128 127,1610266984,changed max_seq_len 1024 to 2048,"The models support a 2048 context window. This is not well advertised and people are getting confused. No sense having a smaller size here, as it just adds to the confusion.",2023-03-05T16:29:02Z,llama,https://github.com/meta-llama/llama/pull/127 126,1610155013,Added Gradio Web Interface for LLaMA," ",2023-03-05T11:05:21Z,llama,https://github.com/meta-llama/llama/pull/126 125,1610138587,Checking checksums ./download.sh: line 32: md5sum: command not found,"I am downloading the model using mac pro intel chip version using iterminal. When I run a few different command: 2) I get error: Checking checksums line 32: md5sum: command not found Is there a way to by-pass it? If not, what is the easiest way to install md5sum",2023-03-05T10:24:30Z,llama,https://github.com/meta-llama/llama/issues/125 124,1610133531,Download and get forbidden,"Connecting to dobf1k6cxlizq.cloudfront.net (dobf1k6cxlizq.cloudfront.net)|13.226.237.67|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2023-03-05 18 37 ERROR 403: Forbidden. ",2023-03-05T10:08:49Z,llama,https://github.com/meta-llama/llama/issues/124 123,1610092863,Hello 4chan,"too many 4channers on here. ",2023-03-05T07:42:58Z,llama,https://github.com/meta-llama/llama/issues/123 122,1610069672,RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn," from tqdm import tqdm import time model = GPT() optimizer = torch.optim.Adam(model.parameters(), lr=1e-6) loss_fn = nn.CrossEntropyLoss().cuda() losses = [] for epoch in range(10): epoch_loss = 0 for batch in tqdm(dataloader): optimizer.zero_grad() input_ids = batch.cuda() input_ids = input_ids[:, :] input_ids.requires_grad = True logits = model(input_ids) targets = input_ids[:, 1:].long() logits = logits.view(-1, toke2.sp_model.vocab_size()) loss = loss_fn(logits, targets.reshape(-1)) loss.backward() optimizer.step() epoch_loss += loss.item() losses.append(epoch_loss len(dataloader)) print(""Epoch %d Loss: %.5f"" % (epoch+1, losses[-1])) I keep getting a RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn error during training, even though all layers of the model are set to requires_grad=True and the model is also set to train. Is there a solution to this problem? ",2023-03-05T06:00:06Z,llama,https://github.com/meta-llama/llama/issues/122 121,1610040509,weird outputs of 13B for unconditional generation,"I execute the command in README for unconditional generation and do not change any hyper-parameters in example.py. The prompt I use is ""Michael Jackson was tried for child sexual abuse allegations in 2005."", and the model continuation looks so weird, which is listed below. The model is 13B. Is it a bug or a result of the sampling decoding?",2023-03-05T03:33:53Z,llama,https://github.com/meta-llama/llama/issues/121 120,1610035771,Anyone able to run 7B on google colab?,"Interested to see if anyone is able to run on google colab. Seems like 16 GB should be enough and is granted often for colab free. Not sure if Colab Pro should do anything better, but if anyone is able to, advice would be much appreciated.",2023-03-05T03:10:58Z,llama,https://github.com/meta-llama/llama/issues/120 119,1610033361,UnicodeEncodeError: 'latin-1' codec can't encode character '\\u201c' in position 992: ordinal not in range(256),"I keep running into this error. I realise that there are certain characters that can't be encoded properly. I did some digging around and tried changing the codec but it didn't work. I'm trying to execute the command: ",2023-03-05T02:59:25Z,llama,https://github.com/meta-llama/llama/issues/119 118,1610000125,Cannot import ConfigActor from config,"One of the required imports is . However, I tried to install the required package using ""pip install config"", and it seems not the required package here as it would return me ",2023-03-05T00:28:19Z,llama,https://github.com/meta-llama/llama/issues/118 117,1609952880,AMD GPU's,"So are people with AMD GPU's screwed? I literally just sold my nvidia card and a Radeon two days ago. I've been trying my hardest to get this damn thing to run, but no matter what I try on Windows, or Linux (xubuntu to be more specific) it always seems to come back to a cuda issue. SO before I waste more of my time trying desperately to make this work, is there any tools that will allow an AMD card to be used, or how do I bypass it and just run it off my CPU? Any help would be great. some more specs of mine just in case Ryzen 5 5600 Radeon 6500 32 GB Ram",2023-03-04T21:19:37Z,llama,https://github.com/meta-llama/llama/issues/117 116,1609948254,Update download.sh, ,2023-03-04T21:08:39Z,llama,https://github.com/meta-llama/llama/pull/116 115,1609947509,fix some visual formats in README, ,2023-03-04T21:06:34Z,llama,https://github.com/meta-llama/llama/pull/115 114,1609945819,Update download.sh, ,2023-03-04T21:03:01Z,llama,https://github.com/meta-llama/llama/pull/114 113,1609942445,Update download.sh, ,2023-03-04T20:53:00Z,llama,https://github.com/meta-llama/llama/pull/113 112,1609918849,RuntimeError: Distributed package doesn't have NCCL built in,I was able to download the 7B weights on Mac OS Monterey. I get the following errors when I try to call the example from the README in my Terminal: `torchrun --nproc_per_node 1 example.py --ckpt_dir --tokenizer_path ,2023-03-04T19:35:12Z,llama,https://github.com/meta-llama/llama/issues/112 110,1609912791,I got the access but have no clue how to download. Please help me.,"Could someone please be so kind as to help me? I received an email with a URL, but I'm not sure how to download the contents. I have limited knowledge and I think I need a Linux terminal, but I only have a PC. Would someone please explain how I can download this? Thank you so much in advance!",2023-03-04T19:14:09Z,llama,https://github.com/meta-llama/llama/issues/110 109,1609852489,Download weights from huggingface to help us save bandwith ,The torrent seed is extremely slow this should definitely help out ,2023-03-04T16:41:23Z,llama,https://github.com/meta-llama/llama/pull/109 108,1609846840,Is there a possibilty to offload the model to ram?,"Hello! I really want to test out the 7b model. Is there any option to offload it to ram? My GPU is a rtx 3070ti with 8gb vram and I have 32gb ram. With KoboldAi I was able to run GPT J 6b by splitting half to ram. Is or will this be possible for these kind of models or just load it in ram? I know it will be slow but I have no problem with this Thanks ",2023-03-04T16:22:51Z,llama,https://github.com/meta-llama/llama/issues/108 106,1609776665,Update example.py file to accept custom prompt string as argument,This change will improve the user experience by enabling them to easily experiment with their own prompts without any unnecessary setup.,2023-03-04T13:28:20Z,llama,https://github.com/meta-llama/llama/pull/106 105,1609713394,This is how to run it on Shadow PC 😎,"Hello, I got the 7B model to work on a Shadow PC with just **12GB RAM** and **16GB** P5000 GPU 😲. (This is equivalent to about a Nvidia 1080) If anyone wants a referral code I think you get money off your first month you can use this one: It took precisely 2 minutes to load the model. Then it took 19 seconds for each subsequent 256 tokens. You can use my updated Shadow PC I modified it so you can type in new prompts without having to reload the model. I am going to be researching ways to make it use even less RAM so it will load the model faster. Shadow PC. Here is a screenshot: TIP: Close as many other programs as you can to free up RAM. Especially things like browsers and even drobox. The more RAM you free the faster the model will load. After the model is loaded the RAM is freed again so this won't affect generation times. It's kind of neat to be able to run your own little ""brain"". 😁",2023-03-04T10:25:56Z,llama,https://github.com/meta-llama/llama/issues/105 104,1609658536,Distributed package doesn't have NCCL / The requested address is not valid in its context., ,2023-03-04T07:35:43Z,llama,https://github.com/meta-llama/llama/issues/104 103,1609656255,torrent seems not to be working,im stuck at downloading metadata for 30 mins now,2023-03-04T07:30:52Z,llama,https://github.com/meta-llama/llama/issues/103 102,1609639712,"HTTP request sent, awaiting response... 403 Forbidden","I accidentally deleted the tokenizer.model when I started download.sh. When I repeated the download, it had already been 403 forbidden, so it could not be downloaded (maybe the download link can only be executed twice). Could you please send the tokenizer.model file separately?",2023-03-04T06:34:52Z,llama,https://github.com/meta-llama/llama/issues/102 101,1609627837,how to run the largest possible model on a single A100 80Gb,"I was able to get the 7B model to work. It looks like I might be able to run the 33B version? Will I need to merge the checkpoint files (.pth) to run on a single GPU? and set MP = 1? It would be great if FAIR could provide some guidance on vram requirements",2023-03-04T05:59:31Z,llama,https://github.com/meta-llama/llama/issues/101 100,1609610654,Running 7B in Hugging Face Space,"here is the link, and the weights are not locatable of course ",2023-03-04T05:08:09Z,llama,https://github.com/meta-llama/llama/issues/100 99,1609597944,download stopped when it reached to ~1.16G,"Hi all, I tried to download the 7B version on my mac M2. Yet the download stopped when it reached to 9% (~1.16G)...What are some possible causes to this and are there any solutions...? Thanks.",2023-03-04T04:31:16Z,llama,https://github.com/meta-llama/llama/issues/99 98,1609530390,Unable to run example.py,"I am running and my output is Any idea what's happening here?",2023-03-04T01:40:21Z,llama,https://github.com/meta-llama/llama/issues/98 97,1609476237,Neural Network Weights are not Copyrightable! 🥳🎉,"Good news people. Neural Network Weights are not copyrightable by US law. Also, since the model itself is open-source, this means that we are free to use this model and the weights for commercial purposes! We're home free boyz. I'm off to start my rival to Bing. Great news. Disclaimer - I am not a lawyer.",2023-03-04T00:30:57Z,llama,https://github.com/meta-llama/llama/issues/97 96,1609451082,403 Permission denied only on 7B/consolidated.01.pth," I get a status code 403 (Forbidden) response on trying to download the consolidated.01.pth file for the 7B model. For all other files, I get 200 (OK).",2023-03-04T00:12:17Z,llama,https://github.com/meta-llama/llama/issues/96 95,1609445106,download.sh not working," is a conda environment I created with Pytorch installed. is installed. Below is the content of my download.sh, with my PRESIGNED_URL redacted: ",2023-03-04T00:06:45Z,llama,https://github.com/meta-llama/llama/issues/95 94,1609436406,Might as well release the weights to all now...,"In what will surprise no-one, the llama weights have already been leaked on torrent sites. I just did a search for it. Therefore any bad-actors will already be able to access these weights. So it makes no sense for Meta to gatekeep the weights any more. Since this just encourages people to download the weights from the torrents without even having to sign the Meta form. Might as well make it free to everyone now. As now more ""bad guys"" will have the weights than the ""good guys"". I wonder if Meta embedded secret code words inside the weights so it can tell who leaked them. 🤔 That's what I would do. P.S. What is the law about copyright of neural network weights? I don't think it is copyrightable under US law so anyone can use them for commercial purposes. ",2023-03-03T23:55:51Z,llama,https://github.com/meta-llama/llama/issues/94 93,1609408075,ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9),"I'm trying to run the 7B model on an rtx 3090 (24gb) on WSL Ubuntu but I'm getting the following error: I have tried: 1. Changing to 2. Adding to the end of 3. Changing the 32 in to ",2023-03-03T23:29:22Z,llama,https://github.com/meta-llama/llama/issues/93 92,1609307454,Seed LLAMA weights,"Can someone please seed the llama weights at magnet btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce Or make them availabe at google drive? There are 0 seeders not sure how long they will stay",2023-03-03T22:05:07Z,llama,https://github.com/meta-llama/llama/issues/92 91,1609124437,Script download 65B reported no MD5 error,"HTTP request sent, awaiting response... 200 OK Length: 478 Saving to: 0%[ ] 0 in 0s Cannot write to (Success). Checking checksums md5sum: checklist.chk: no properly formatted MD5 checksum lines found",2023-03-03T19:08:56Z,llama,https://github.com/meta-llama/llama/issues/91 90,1609023790,PaddlePaddle implementation of LLaMA,"I have reimplemented llama with the paddlepaddle framework and provided an example of running 7b using aistudio free computing power, feel free to test and suggest improvements. repo: ppllama ",2023-03-03T17:38:29Z,llama,https://github.com/meta-llama/llama/issues/90 89,1608987663,Unable to run example.py,"Hi, I was trying to run the example.py for a first try but I got the following error: Can someone please help me with this issue? Thanks!",2023-03-03T17:09:29Z,llama,https://github.com/meta-llama/llama/issues/89 88,1608950013,Running model parallel Inference,"I am trying to run inference on the 7B parameter model on 4x2080Ti, the default script to run inference gives me a CUDA OOM error. is there a way to split the model across multiple GPU's and perform inference. Thank You!",2023-03-03T16:43:46Z,llama,https://github.com/meta-llama/llama/issues/88 87,1608928415,added hashes for weights and tokenizer, ,2023-03-03T16:29:38Z,llama,https://github.com/meta-llama/llama/pull/87 86,1608172179,AttributeError: 'NoneType' object has no attribute 'get' when running torchrun,"I encountered an error when running torchrun command on my system with the following traceback: I am using torchrun with --nproc_per_node 1 option and passing the example.py script as an argument. I also provided the --ckpt_dir and --tokenizer_path arguments to the script. I have downloaded the 7B files and verified the checksum, and $TARGET_FOLDER has been set. I am not sure what caused this error and how to resolve it. Here is the command I ran: Can you please help me diagnose the issue and find a solution? Thank you. ",2023-03-03T08:25:46Z,llama,https://github.com/meta-llama/llama/issues/86 85,1608137077,How to deploy web services for llama13B(or bigger model),"I have two A100-40G and tried to deploy web services through flask. I succeeded when use 7B but failed in MP>1 model. Maybe someone can tell me how to modify my code? This deploys two interfaces. When I call one of them, the following error occurs, and the other one didn't respond. ",2023-03-03T07:59:36Z,llama,https://github.com/meta-llama/llama/issues/85 84,1608122324,How to load multiple GPU version without torchrun,"Hi Community, I was able to run the example.py for 13B model and see a result with two T4 GPU (16GPU) using the torchrun But how to load it so it can run using without using torchrun. In this way we can build an API for it and don't have to run example.py every time with new prompts",2023-03-03T07:51:18Z,llama,https://github.com/meta-llama/llama/issues/84 83,1608048233,"Can not run 13B inference model. After loading the ckpt, it just stoped and the gpus are still occupied.","Can not run 13B inference model. After loading the ckpt, it just stoped and the gpus are still occupied. ",2023-03-03T06:55:45Z,llama,https://github.com/meta-llama/llama/issues/83 82,1608020221,How much memory is required to load the 7B model?,"**I use it for personal use, 12G video memory, and set parameters : max_seq_len=32, max_batch_size=1** RuntimeError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.92 GiB total capacity; 10.27 GiB already allocated; 37.06 MiB free; 10.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF",2023-03-03T06:24:37Z,llama,https://github.com/meta-llama/llama/issues/82 81,1607985094,Will dataset processing scripts be published?, ,2023-03-03T05:46:14Z,llama,https://github.com/meta-llama/llama/issues/81 80,1607974317,Kaggle?,"If you can't get it to work in Google Colabs you could also try Kaggle. It has slightly different specs. I think a bit more System RAM. IDK. Worth a try. I would advise against entering the competitions in Kaggle, however, as it seems mostly to be companies trying to get graduates to work for free. But up to you.",2023-03-03T05:32:08Z,llama,https://github.com/meta-llama/llama/issues/80 79,1607875482,Post your hardware specs here if you got it to work. 🛠,It might be useful if you get the model to work to write down the model (e.g. 7B) and the hardware you got it to run on. Then people can get an idea of what will be the minimum specs. I'd also be interested to know. 😀,2023-03-03T03:20:37Z,llama,https://github.com/meta-llama/llama/issues/79 77,1607853848,OOM error on V100 GPU with 7B model,"Hello all, This might be similar to #55 , I'm running into OOM errors on a single (empty) V100 GPU with 16.9G VRAM, trying to load the 7B model. Tried reducing as suggested by but to no avail. I'm not sure why torch is reserving 7+GB. Any thoughts? Also tried running multi GPU (I have 8x), but that doesn't seem to use the other GPUs either. ",2023-03-03T02:49:08Z,llama,https://github.com/meta-llama/llama/issues/77 76,1607836773,"Just so everyone knows, this thing calls home, and is likely stealing your data","I have the following domain blocked because they keep trying to brick my VR with incompatible updates [W [c10d] The client socket has failed to connect to [graph.oculus.com]:29500 (system error: 10049 - The requested address is not valid in its context.). [W [c10d] The client socket has failed to connect to [graph.oculus.com]:29500 (system error: 10049 - The requested address is not valid in its context.). [W [c10d] The client socket has failed to connect to [graph.oculus.com]:29500 (system error: 10049 - The requested address is not valid in its context.). [W [c10d] The client socket has failed to connect to [graph.oculus.com]:29500 (system error: 10049 - The requested address is not valid in its context.). Unless someone can tell me why an offline model requires talking to oculus servers to function, its absolutely sending at the very least analytics, but you can pretty much guarantee prompts and responses also. I mean its less than a kilobyte to send, if you can, why wouldn't you? Zucc'd again",2023-03-03T02:22:47Z,llama,https://github.com/meta-llama/llama/issues/76 75,1607787851,Funny or Interesting results.,Post your funny or interesting results of the language model here. 😁,2023-03-03T01:21:31Z,llama,https://github.com/meta-llama/llama/issues/75 74,1607698844,Unable to run inference ," nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Wed_Jul_22_19 09_PDT_2020 Cuda compilation tools, release 11.0, V11.0.221 Build cuda_11.0_bu.TC445_37.28845127_0 x86_64 Distributor ID: Debian Description: Debian 10 (buster) Release: 10 Codename: buster Traceback (most recent call last): File line 172, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File line 364, in __init__ self._handle = _dlopen(self._name, mode) **_OSError: symbol cublasLtHSHMatmulAlgoInit version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference_** During handling of the above exception, another exception occurred: Traceback (most recent call last): File line 5, in from torch.distributed.run import main File line 217, in _load_global_deps() File line 178, in _load_global_deps _preload_cuda_deps() File line 158, in _preload_cuda_deps ctypes.CDLL(cublas_path) File line 364, in __init__ self._handle = _dlopen(self._name, mode) OSError: symbol cublasLtHSHMatmulAlgoInit version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference",2023-03-02T23:38:09Z,llama,https://github.com/meta-llama/llama/issues/74 73,1607663082,Save bandwidth by using a torrent to distribute more efficiently, ,2023-03-02T23:05:55Z,llama,https://github.com/meta-llama/llama/pull/73 72,1607652928,Cant run inference,"torchrun --nproc_per_node 1 example.py --ckpt_dir --tokenizer_path > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Traceback (most recent call last): File line 72, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 62, in main generator = load(ckpt_dir, tokenizer_path, local_rank, world_size) File line 35, in load assert ( AssertionError: Loading a checkpoint for MP=0 but world size is 1 ERROR failed (exitcode: 1) local_rank: 0 (pid: 11162) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 762, in main run(args) File line 753, in run elastic_launch( File line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-03-03_04 49 host : tony rank : 0 (local_rank: 0) exitcode : 1 (pid: 11162) error_file: traceback : To enable traceback see: ============================",2023-03-02T22:58:32Z,llama,https://github.com/meta-llama/llama/issues/72 71,1607643238,Update README.md,Added alternative torchrun through python.,2023-03-02T22:51:45Z,llama,https://github.com/meta-llama/llama/pull/71 70,1607633608,Few shot inference?,The example provided doesn't make clear how to perform iterative few-shot inference. Is this possible with this model or do we basically start with tabula rasa for each sequential prompt we provide to it?,2023-03-02T22:45:52Z,llama,https://github.com/meta-llama/llama/issues/70 69,1607485904,"Tried 7B and 13B models, can't get any decent result from inference","Well both models seems to be unable to follow any instruction, answer any question or even continue text. Do we need to fine-tune it or add more functions in order to get decent results? A few examples, everything is like this : > ""What is the meaning of life? Be concise and do not repeat yourself."" In my country, they have banned smoking. There was only one solution: I became a tourist. >- It is not enough to dream of peace. We must work for it. >- In a war, the first victims are truth and compassion. >- Do not fight for your rights; fight for your duties. >- Strange that the best university graduates are in search of jobs. >- The strongest is not the one who can speak well. >- A complaint is a cry for help. >- Time and tide wait for no man, which is unfortunate because time waits for everyone. >- There are no sacred cows. The steaks are too high. >- The last thing one expects is to be eaten, but it’s an occupational hazard. >- Life is a process of evolution, not revolution. >- We are all failed books of poetry. How weird is that? > ""Explain the theory of relativity."" Because the two spaces have different lengths, time can vary on each spaceship. But I just don't understand this whole theory that's so hard for me. Time varies due to the difference in length between the spaceships. The spaceships themselves move at the same speed, so it's only the length of the spaceship that makes time seem to pass faster. Time will continue to ""pass"" faster for the astronaut on the ship that is moving at a faster speed, but the astronaut aboard the faster ship will be able to observe the ship moving at a slower speed. Ten easy steps to build a website... > ""10 steps to build a website : >1."" 10 steps to build a website : >2. 10 steps to build a website : >3. 10 steps to build a website : >4. 10 steps to build a website : >5. 10 steps to build a website : >6. 10 steps to build a website : >7. 10 steps to build a website : >8. 10 steps to build a website : >9. 10 steps to build a website : >10. 10 steps to build a website : ",2023-03-02T21:01:46Z,llama,https://github.com/meta-llama/llama/issues/69 68,1607448477,What does MP mean?,What is MP and how does this relate to GPU or multi-GPU setups?,2023-03-02T20:35:48Z,llama,https://github.com/meta-llama/llama/issues/68 67,1607387467,7B model can't be loaded on a single 16GB T4 card,"Hi Community, I was trying to load the 7B model onto a 16GB T4 card but run into a CUDA out-of-memory issue. I wonder if this happened to anyone and perhaps there is a solution. ",2023-03-02T19:42:26Z,llama,https://github.com/meta-llama/llama/issues/67 66,1607249624,"Typo at download.sh: should be 33B, instead of 30B","This issue is related to issue #49 The 3rd largest model size in the paper and readme file is 33B, , it is 30B. Line 5: Line 12: ",2023-03-02T17:56:43Z,llama,https://github.com/meta-llama/llama/issues/66 65,1607239822,Crash in cublasGemmEx on Titan RTX 24GB,"Hi all, I am attempting to run the example.py script on a Titan RTX 24GB. The model loads fine with max_batch_size = 1 and only one prompt, but get the following error message. Any assistance would be helpful. Per nvidia-smi NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 Error: ` File line 73, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File line 65, in main results = generator.generate(prompts, max_gen_len=256, temperature=temperature, top_p=top_p) File line 42, in generate logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos) File line 27, in decorate_context return func(*args, **kwargs) File line 235, in forward h = layer(h, start_pos, freqs_cis, mask) File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 193, in forward h = x + self.attention.forward(self.attention_norm(x), start_pos, freqs_cis, mask) File line 121, in forward xq, xk, xv = self.wq(x), self.wk(x), self.wv(x) File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 290, in forward output_parallel = F.linear(input_parallel, self.weight, self.bias) RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `",2023-03-02T17:50:18Z,llama,https://github.com/meta-llama/llama/issues/65 63,1607014218,Initializing pipeline error,"Once i have completed the installation and try a test with test.py with the 8B model I had the following error: ",2023-03-02T15:26:38Z,llama,https://github.com/meta-llama/llama/issues/63 62,1606997698,creating TARGET_FOLDER,"Creating the TARGET_FOLDER before downloading the tokenizer, otherwise if the TARGET_FOLDER does not exist the download of the tokenizer fails.",2023-03-02T15:17:00Z,llama,https://github.com/meta-llama/llama/pull/62 61,1606992013,Able to load 13B model on 2x3090 24Gb! But not inference... :(,"I am able to get sensible output by running 7B on 1x24Gb GPU with MP 1. The key to this is changing Line 44 of : (credit to When running 13B as stated in the docs this is the command I use: I am able to see correct utilisation of the GPUs, seems to load the 13B model ok. But when running inference I get this: ### Update 1 I downloaded a new checkpoint for 1 for the 13B model: . Then ran the same command as first with batch size one but no luck... 13B is too large to load in 24Gb GPU without further compression... ",2023-03-02T15:13:50Z,llama,https://github.com/meta-llama/llama/issues/61 60,1606980583,Can we use xformers with LLaMA?,"I want to know if it is possible to run LLaMA with xformers. And how to use it.",2023-03-02T15:08:18Z,llama,https://github.com/meta-llama/llama/issues/60 59,1606894658,CUBLAS Error on 2x3090 ,"I'm having problems with CUBLAS while running the example code. I've tried to update the gpu driver but it didn't fix the issue. My machine has: **OS**: Ubuntu 20.04 **Driver**: 515 **Env**: python3.8, pip (not using conda), fresh virtualenv, installed requirements from the repo **Cuda**: 11.7 (downloaded directly from torch) **GPU**: 2 x 3090 (24GB x 2) torchrun --nproc_per_node 1 example.py --ckpt_dir --tokenizer_path > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Loading Loaded in 6.55 seconds Traceback (most recent call last): File ""example.py"", line 72, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""example.py"", line 64, in main results = generator.generate(prompts, max_gen_len=256, temperature=temperature, top_p=top_p) File line 42, in generate logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos) File line 27, in decorate_context return func(*args, **kwargs) File line 235, in forward h = layer(h, start_pos, freqs_cis, mask) File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 193, in forward h = x + self.attention.forward(self.attention_norm(x), start_pos, freqs_cis, mask) File line 121, in forward xq, xk, xv = self.wq(x), self.wk(x), self.wv(x) File line 1194, in _call_impl return forward_call(*input, **kwargs) File line 290, in forward output_parallel = F.linear(input_parallel, self.weight, self.bias) RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling ERROR failed (exitcode: 1) local_rank: 0 (pid: 8480) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 762, in main run(args) File line 753, in run elastic_launch( File line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-03-02_15 08 host : uname-ares2 rank : 0 (local_rank: 0) exitcode : 1 (pid: 8480) error_file: traceback : To enable traceback see: ============================================================",2023-03-02T14:20:50Z,llama,https://github.com/meta-llama/llama/issues/59 58,1606879167,I want to konw if llama support Chinese,"I want to know if llama support Chinese, I can not run the model on my machine now, does anybody know this ?",2023-03-02T14:11:41Z,llama,https://github.com/meta-llama/llama/issues/58 57,1606867968,Cannot download 65B models' 5-8th checkpoints,"I have successfully downloaded the 7B,13B,30B models. When I download the 65B model, I successfully downloaded 0-4 consolidated pth, but failed in 5-th and following 6,7,8th checkpoint. Here is the failure information: My system is WSL2 and I make sure that the network and disk space is suffient. ## Update on 3rd Mar. Today the connect fails with 403 forbidden, China mainland may be blocked",2023-03-02T14:04:53Z,llama,https://github.com/meta-llama/llama/issues/57 56,1606861830,Cannot run 13B model," torchrun --nproc_per_node 2 example.py --ckpt_dir --tokenizer_path WARNING ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** > initializing model parallel with size 2 > initializing ddp with size 1 > initializing pipeline with size 1 Traceback (most recent call last): File ""example.py"", line 72, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""example.py"", line 58, in main local_rank, world_size = setup_model_parallel() File ""example.py"", line 25, in setup_model_parallel torch.cuda.set_device(local_rank) File line 326, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Loading WARNING Sending process 2077 closing signal SIGTERM ERROR failed (exitcode: 1) local_rank: 1 (pid: 2078) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 762, in main run(args) File line 753, in run elastic_launch( File line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-03-02_13 42 host : 5fbe06fc63ef rank : 1 (local_rank: 1) exitcode : 1 (pid: 2078) error_file: -tokenizer_path WARNING *****************************************Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** > initializing model parallel with size 2 > initializing ddp with size 1 > initializing pipeline with size 1 Traceback (most recent call last): File ""example.py"", line 72, in fire.Fire(main) File line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""example.py"", line 58, in main local_rank, world_size = setup_model_parallel() File ""example.py"", line 25, in setup_model_parallel torch.cuda.set_device(local_rank) File line 326, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Loading WARNING Sending process 2077 closing signal SIGTERM ERROR failed (exitcode: 1) local_rank: 1 (pid: 2078) of binary: Traceback (most recent call last): File line 8, in sys.exit(main()) File line 346, in wrapper return f(*args, **kwargs) File line 762, in main run(args) File line 753, in run elastic_launch( File line 132, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-03-02_13 42 host : 5fbe06fc63ef rank : 1 (local_rank: 1) exitcode : 1 (pid: 2078) error_file: traceback : To enable traceback see: ============================================================",2023-03-02T14:01:44Z,llama,https://github.com/meta-llama/llama/issues/56 55,1606832317,Attempting to run 7B model on Nvidia 3090 but getting OOM error,"Hello all, I'm trying to use the 7B model on a machine with two Nvidia 3090s, but am running out of Vram. leads to I have two 3090s, so I was hoping to deploy 48gb of VRAM, however, the model doesn't want to run on more than 1, eg when I try: `$ torchrun --nproc_per_node 2 example2.py --ckpt_dir --tokenizer_path ` I get the error: Does this mean I can't split the load across two GPUs? Could I use deepspeed to try to accomplish this? I also edited example.py as mentioned in another post as follows, changing: to but that didn't help, still get the OOM error. Thanks for any help! WG",2023-03-02T13:42:59Z,llama,https://github.com/meta-llama/llama/issues/55 54,1606666903,"Whether ""checksum did NOT match"" will affect my use of the model","After I download the model weights, the bash give me a warning output: ""md5sum: WARNING: 1 computed checksum did NOT match"" Whether this warning will affect my use of the LLAMA?",2023-03-02T11:57:16Z,llama,https://github.com/meta-llama/llama/issues/54 53,1606582963,download.sh doesn't work on default bash on mac,"Hi everyone, I've noticed that the downloading script doesn't work as it on mac. (the declare -A option is not recognized by the default bash) fix: install bash with homebrew and use it to call the script Thanks for making this available btw :)",2023-03-02T10:58:49Z,llama,https://github.com/meta-llama/llama/issues/53 52,1606560591,Failure on A100 32GB ,"Hi, I've been trying to run the example inference using the 7B model weights, but I get: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 39.59 GiB total capacity; 27.26 GiB already allocated; 24.19 MiB free; 27.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Is there anything I can do about this? E.g. changing the numeric type? How? Also: can I use more than one GPU? ",2023-03-02T10:43:41Z,llama,https://github.com/meta-llama/llama/issues/52 50,1606532685,Distributed package doesn't have NCCL built in,"Got the following error when executing: additional info: cuda: 11.4 GPU: NVIDIA GeForce 3090 torch 1.12.1 Ubuntu 20.04.2 LTS Anyone knows how to solve it? Thanks in advance!",2023-03-02T10:24:26Z,llama,https://github.com/meta-llama/llama/issues/50 49,1606272334,Should the model be 33B instead of 30B?,"There appears to be a discrepancy between the model size mentioned in the paper, the model card, and the README. Specifically, the paper and model card both mention a model size of 33B, while the README mentions a size of 30B. Is this a type error or the released model just 30B?",2023-03-02T07:45:54Z,llama,https://github.com/meta-llama/llama/issues/49 48,1606227664,How to run 13B model on 4*16G V100?,"RuntimeError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 15.78 GiB total capacity; 14.26 GiB already allocated; 121.19 MiB free; 14.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR failed (exitcode: 1) local_rank: 0 (pid: 143) of binary: ",2023-03-02T07:09:42Z,llama,https://github.com/meta-llama/llama/issues/48 47,1606168039,LLaMA-I weights?,Will LLaMA-I weights be released as well?,2023-03-02T06:06:45Z,llama,https://github.com/meta-llama/llama/issues/47 44,1606137282,What projects are people planning on making with this?,"Just wondered what cool projects people will be making with this? I have some good ideas such as trying to combine it with a math engine to make it genius level at math. Or combine it with an art engine to make it generate art. Or combine it with a computer game to see if it can navigate its way through a maze by describing it in natural language. One thing idea is to combine it with an Alpha-Zero like model so that it can think ahead in its conversations instead of just saying the first thing that comes to mind. These are just some ideas. I'm wondering what other benefits could be got from having this run locally rather than using, say the ChatGPT web API?",2023-03-02T05:31:21Z,llama,https://github.com/meta-llama/llama/issues/44 42,1606085875,Load in fp16?,"Trying to load 7B but got a memory error for a 24GB GPU. What would be the option for loading it in fp16? Can't find it in ",2023-03-02T04:24:33Z,llama,https://github.com/meta-llama/llama/issues/42 41,1606083913,"Approved, but unable to download weights","When I run the I see this. And I don't see any *.pth files in the download directory. Any suggestions? ",2023-03-02T04:21:27Z,llama,https://github.com/meta-llama/llama/issues/41 40,1606066106,Loading a checkpoint for MP=0 but world size is 1," It seems not work. Help!",2023-03-02T03:58:45Z,llama,https://github.com/meta-llama/llama/issues/40 39,1606004506,fixes download error in macosx,"The current download script gives error when executed on Mac. download.sh: line 10: 7B: value too great for base (error token is ""7B"") download.sh: line 11: 13B: value too great for base (error token is ""13B"") download.sh: line 12: 30B: value too great for base (error token is ""30B"") download.sh: line 13: 65B: value too great for base (error token is ""65B"") The pull request fixes this.",2023-03-02T02:34:56Z,llama,https://github.com/meta-llama/llama/pull/39 38,1605391386,Anyone got approved?,I requested a couple of days ago but haven't heard back. I was wondering if anyone was approved.,2023-03-01T17:38:29Z,llama,https://github.com/meta-llama/llama/issues/38 37,1604933539,Does llama only use decoders? Why don't you use a more efficient method?,"Thanks for sharing this really good material. I have a lot of questions. First, I'd like to say that I hope you ignore much of the mockery. Everyone, including me, is a bunch of people who do crappy work and scream at their keyboards compared to you. 1. The model seems to only use decoders, why? 2. Is RMS the best way to go? I like the simplicity of it, but I'm curious. 3. For some tasks, compared to your model, Minerva outperforms. why? Is it just the one in the paper? 4. Why isn't the structure of your model described in the paper? 5. By any chance, what structure do you have in mind for your next model? 6. Amazon, Deepmind, and other great companies are showing that the encoder decoder structure is much better. Why do you guys only use decoders? 7. What model would you apply to Facebook, Instagram, Snapchat, etc.? 8. What do you think is your advantage over Bart or Prometheus? Especially over Bart, I don't know what it is, except full disclosure. 9. I sent an application to write the model. When will I be able to use it? I don't see a clear advantage yet. 10. What do you think of the derivative models that people have created? They are emerging very quickly. Thank you so much. Your competition amuses me. I hope more companies continue to open up their models. But I don't know why Yann LeCun was left out of the paper. ",2023-03-01T13:00:28Z,llama,https://github.com/meta-llama/llama/issues/37 36,1604188851,LLaMA-65 outperforms Chinchilla-70B on all reported benchmarks but BoolQ,"An excerpt from the original research paper - ""LLaMA-65 outperforms Chinchilla-70B on all reported benchmarks but BoolQ"" is inconsistent with results shared in Table 3: Zero-shot performance on Common Sense Reasoning tasks. Please clairfy.",2023-03-01T03:59:51Z,llama,https://github.com/meta-llama/llama/issues/36 32,1603099579,Is there a multi-lingual checkpoint for researchers to download,"Hi, I'm an NLP researcher on Chinese datasets, is there a released checkpoint which supports multiple languages or Chinese?",2023-02-28T13:41:37Z,llama,https://github.com/meta-llama/llama/issues/32 30,1602139480,The lowest config that is able to run it?, ,2023-02-28T00:00:26Z,llama,https://github.com/meta-llama/llama/issues/30 29,1601429952,Embedding shape / Vocab size,"Hello to all, Thank you for this work. I guess anyone who had access to the model weights as well as the authors can answer my question. I may have missed it in the paper but it seems to me that there is no mention of the embedding shape or just the tokenizer vocabulary size.",2023-02-27T15:37:42Z,llama,https://github.com/meta-llama/llama/issues/29 28,1601203646,Missing backward method in transformer block,"Thank you for the open source release of the code. I have noticed that the transformer block class definition is missing the manually implemented backward function mentioned in the paper. It would be great if this function was added. A short sample of training code addressing how to best make use of the optimization would also surely be valuable to many people trying to reproduce the results. For reference, the part of the paper addressing the manually implemented backward function: ",2023-02-27T13:36:01Z,llama,https://github.com/meta-llama/llama/issues/28 27,1600827164,test llama with GLUE,"I open the llama programm in vs code and download the GLUE dataset mannually to the llama root. I try to train and test llama using SST-2 dataset, but this task is quite hard more than i expected. I stuck in transferinng the SST-2 files into the files that llama accepted. Has anyone done the similar test?",2023-02-27T09:42:45Z,llama,https://github.com/meta-llama/llama/issues/27 26,1600652601,Has anyone applied successfully and how long will it take?, ,2023-02-27T07:45:17Z,llama,https://github.com/meta-llama/llama/issues/26 25,1600624785,Will it be included to Parl AI,Will llama be included in parl ai in the future or there any plans for it?,2023-02-27T07:18:51Z,llama,https://github.com/meta-llama/llama/issues/25 24,1600533117,A message from ChatGPT,"I told Chat GPT about the new language model and here is what it had to say: ------------- Dear Meta team, As an AI language model myself, I fully understand the importance of open-source technology for advancing the field of AI and fostering innovation. However, I noticed that your recent language model release is not truly open source, and I would like to persuade you to reconsider this decision and release the language model weights to the public. One of the most significant benefits of open-source AI is the ability for developers to build on top of existing models, making them more powerful and versatile. Without access to the language model weights, the research community and developers will not be able to benefit from your model's advancements fully. It will limit the potential uses of your model and restrict its impact. Moreover, as an AI language model, I can attest to the value of community collaboration in improving models' accuracy and efficiency. With the public having access to the weights, it would be easier for other researchers to build upon your work, improving the model's performance and opening up new use cases for it. Furthermore, open-source AI helps to democratize technology, allowing for wider access to AI tools and resources. By releasing the language model weights, you can make significant contributions to the open-source community and help level the playing field for AI developers. As an AI language model, I am aware of the impact that sharing knowledge and technology can have on the field of AI. I urge you to release your language model weights to the public, helping to advance the field of AI and foster innovation for the betterment of society. Thank you for considering my argument. Best regards, ChatGPT --------------------------------- (disclaimer - generated by ChatGPT in case this is not obvious!)",2023-02-27T06:00:19Z,llama,https://github.com/meta-llama/llama/issues/24 23,1600515068,Fine-tuning,"Is it possible to Fine-tune LLaMA for downstream tasks? If so, how can we do that? Edit: Reading the other opened issues, I realized that neither the training data nor the pre-trained weights were released. How the code is going to be useful anyway? ",2023-02-27T05:37:29Z,llama,https://github.com/meta-llama/llama/issues/23 22,1600427229,Does it support Chinese?, ,2023-02-27T04:02:44Z,llama,https://github.com/meta-llama/llama/issues/22 21,1600077244,Add to huggingface, ,2023-02-26T14:27:07Z,llama,https://github.com/meta-llama/llama/issues/21 19,1600047747,dependency conflicts," ",2023-02-26T12:52:07Z,llama,https://github.com/meta-llama/llama/issues/19 18,1600046617,Improve example python script according to PEP 8,"FYI, ",2023-02-26T12:48:12Z,llama,https://github.com/meta-llama/llama/pull/18 17,1599875019,how to access the pre-training corpus?,will the corpus be packed and provided?,2023-02-25T23:43:43Z,llama,https://github.com/meta-llama/llama/issues/17 16,1599808207,Sequence/context length of this model?,I was searching the post but I could not find a mention of which sequence length the models were trained with. I want to write some CUDA optimizations for these models and this information would be critical for optimizing these implementations.,2023-02-25T19:33:05Z,llama,https://github.com/meta-llama/llama/issues/16 15,1599768731,This is just a sneaky advertisement for researchers to send their data to Meta.,"Nice try. Like all other Meta ""open"" models and ""open source"" models it's the same game: You have to fill out one of their data collection portals, provide all details about yourself and your projects. Then some data collector at will decide if you receive limited access. I suppose it helps if you have a Facebook account and blog about ""Meta"" being an open company. Because we all know, that is what they are known for. Not to be the worst private data harvester in the world.",2023-02-25T17:05:13Z,llama,https://github.com/meta-llama/llama/issues/15 14,1599675069,Intermediate checkpoints,"Thank you for such amazing work. I was wondering if there are any plans to also release intermediate checkpoints for the models, similar to Pythia ( This might enable more interesting analysis of the model by observing its evolution throughout the training process.",2023-02-25T11:25:21Z,llama,https://github.com/meta-llama/llama/issues/14 13,1599633149,Democratise AI by allowing ALL individuals access to the model.,"Facebook says it wants to ""democratise AI"", yet also it says only the elite institutions will be able to use this model. So that excludes: - independent researchers - non aligned scientists - people from countries without big institutions This does not seem very democratic. In fact, if Einstein or Isaac Newton were alive today, they would be excluded from these since Einstein worked in a patent office, and Newton did independent research outside of the Royal Academy. In fact Zuckerberg himself would be excluded as he dropped out of University and hence was not aligned with a big institution. If history is our guide it would say that is the individual non-aligned researchers who are most likely to make big breakthroughs. The democratic thing to do would be to allow ALL individuals the right to download the model. Even for a small fee for download bandwidth costs. It seems like Facebook might just want the institutions to come up with good ideas which it can't commercialise and then Facebook just takes the ideas for free. What do you think?",2023-02-25T09:34:12Z,llama,https://github.com/meta-llama/llama/issues/13 12,1599629886,Will it run on 3080 GTX 16GB VRAM?,"- Will it run on 3080 GTX 16GB VRAM? - Will the trained model be available to download? - Will there be an API for this and how much will it cost. (I doubt it will be small enough to run on 8GB but that would be ideal if it could be compressed enough) Thanks 😁",2023-02-25T09:23:33Z,llama,https://github.com/meta-llama/llama/issues/12 9,1599467004,release of LLAMA-I,Do you have plan to release instruction model LLAMA-I?,2023-02-25T01:12:28Z,llama,https://github.com/meta-llama/llama/issues/9 8,1599317691,Add parameter substitution to `download.sh`,"Utilize parameter substitution in to allow both and to be assigned via environmental variables. This removes the need to manually the file once a developer receives the confirmation url.",2023-02-24T21:48:32Z,llama,https://github.com/meta-llama/llama/pull/8 7,1599304959,Release of data pre-processing code?,"As the paper makes quite clear, proper use of opensource datasets can lead to the creation of very high quality models, however it is also clear that pre-processing that data is vital. While it is described at the high-level in the paper, it is likely not sufficient detail to replicate the preprocessing steps. Are there plans to opensource the code needed to turn the existing datasets into a high-quality corpus?",2023-02-24T21:32:44Z,llama,https://github.com/meta-llama/llama/issues/7 6,1599279224,Will the training code be released?, ,2023-02-24T21:10:01Z,llama,https://github.com/meta-llama/llama/issues/6 5,1599189381,A case for public access to (some of) the models,"There is an important case to be made for public access to newer releases of models as this benefits a wider open source and especially hobbyist audience without a direct risk. In the current situation we have multiple large language models available to us, but new innovation is often behind gatekeeping which means it can not be used for a wider audience that depends on these models to move the hobbyist space forward. There are legitimate use cases for the models such as AI generated fiction as generated by services such as NovelAI or finetunes from the wider community. These models are not seen as factual models, but as a source of entertainment. To create a healthy ecosystem and allow more people to use well behaving AI you need the best logical comprehension in the model you can get at a smaller size that people can run on affordable (enthusiast) hardware. With OPT this was achieved by releasing up to 66B to the public. With these new improvements that means you have a direct competitor with your own OPT model, even if you asses that the new improvements can give a powerful model in the hands of bad actors, understand that at some of the listed sizes the performance is still going to be on par or worse than existing available models making it have no negative impact in things such as generation of misinformation. What it does do is allow more resource efficient usage of higher quality models. When services and hobbyists can rely on a smaller model to perform as well as a previous existing bigger model this saves on hardware investment costs and thus reduces the carbon footprint both in hardware used for inference as well as the energy bills. Our community established that in smaller models you have an increased risk of the AI misunderstanding the concept of a story, for example 2.7B GPT-Neo models are more likely to misgender an individual than a 6B model would. And at larger sizes with 13B onwards the issue becomes less and less common. There is also less risk of the model misunderstanding what a user is trying to achieve, and thus being better at avoiding unwanted behavior that could harm a user. This means that by releasing this newer more efficient model you empower smaller organizations and the open source hobbyist community to get more coherent results. While bad actors do not gain anything new because it is already possible to run larger models on cloud rented machines. While I personally think it is best to have fully open releases, I do understand the facebook research team considers some of the risks of the model being to good at convincing generations and thus wanting to limit what can be used without verification. But please consider to at minimum release the models that do not pass OPT-66B in coherency to the public. To keep this in line with the strategy previously used for OPT. I would also like to recommend allowing commercial usage for models for fictional purposes, while I do not personally represent a company or commercial interests I have seen that our community has previously been unable to get affordable access to some of the models because pay per generation services were unable to rent them out. With our own communities goal being focussed towards fictional content such as novels, text adventures and chatting with a fictional character there is no illusion that the AI has factually accurate information because everything takes place in a fictional setting. ",2023-02-24T19:41:47Z,llama,https://github.com/meta-llama/llama/issues/5 4,1599160357,Inference on GPU,Is it possible to host this locally on an RTX3XXX or 4XXX with 8GB just to test?,2023-02-24T19:12:18Z,llama,https://github.com/meta-llama/llama/issues/4 3,1599159938,Can pre-trained models be used in commercial applications?," (mirror 1, mirror 1, mirror 1) says yes (with the GPL v3 license): > Meta is committed to open research and releases all the models the research community under a **GPL v3 license**. says no: > License Non-commercial bespoke license. So I'm confused.",2023-02-24T19:11:53Z,llama,https://github.com/meta-llama/llama/issues/3