Total Pageviews

Saturday, 20 January 2024

Code Llama

 Inference code for CodeLlama models.

Code Llama is a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B and 13B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama was developed by fine-tuning Llama 2 using a higher sampling of code. As with Llama 2, we applied considerable safety mitigations to the fine-tuned versions of the model. For detailed information on model training, architecture and parameters, evaluations, responsible AI and safety refer to our research paper. Output generated by code generation features of the Llama Materials, including Code Llama, may be subject to third party licenses, including, without limitation, open source licenses.

We are unlocking the power of large language models and our latest version of Code Llama is now accessible to individuals, creators, researchers and businesses of all sizes so that they can experiment, innovate and scale their ideas responsibly. This release includes model weights and starting code for pretrained and fine-tuned Llama language models — ranging from 7B to 34B parameters.

This repository is intended as a minimal example to load Code Llama models and run inference.

Download

In order to download the model weights and tokenizers, please visit the Meta website and accept our License.

Once your request is approved, you will receive a signed URL over email. Then run the download.sh script, passing the URL provided when prompted to start the download. Make sure that you copy the URL text itself, do not use the 'Copy link address' option when you right click the URL. If the copied URL text starts with: https://download.llamameta.net, you copied it correctly. If the copied URL text starts with: https://l.facebook.com, you copied it the wrong way.

Pre-requisites: make sure you have wget and md5sum installed. Then to run the script: bash download.sh.

Keep in mind that the links expire after 24 hours and a certain amount of downloads. If you start seeing errors such as 403: Forbidden, you can always re-request a link.

Model sizes

Model Size
7B ~12.55GB
13B 24GB
34B 63GB

Setup

In a conda env with PyTorch / CUDA available, clone the repo and run in the top-level directory:

pip install -e .

Inference

Different models require different model-parallel (MP) values:

Model MP
7B 1
13B 2
34B 4

All models support sequence lengths up to 100,000 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware and use-case.

Pretrained Code Models

The Code Llama and Code Llama - Python models are not fine-tuned to follow instructions. They should be prompted so that the expected answer is the natural continuation of the prompt.

See example_completion.py for some examples. To illustrate, see command below to run it with the CodeLlama-7b model (nproc_per_node needs to be set to the MP value):

torchrun --nproc_per_node 1 example_completion.py \
    --ckpt_dir CodeLlama-7b/ \
    --tokenizer_path CodeLlama-7b/tokenizer.model \
    --max_seq_len 128 --max_batch_size 4

Pretrained code models are: the Code Llama models CodeLlama-7b, CodeLlama-13b, CodeLlama-34b and the Code Llama - Python models CodeLlama-7b-Python, CodeLlama-13b-Python, CodeLlama-34b-Python.

Code Infilling

Code Llama and Code Llama - Instruct 7B and 13B models are capable of filling in code given the surrounding context.

See example_infilling.py for some examples. The CodeLlama-7b model can be run for infilling with the command below (nproc_per_node needs to be set to the MP value):

torchrun --nproc_per_node 1 example_infilling.py \
    --ckpt_dir CodeLlama-7b/ \
    --tokenizer_path CodeLlama-7b/tokenizer.model \
    --max_seq_len 192 --max_batch_size 4

Pretrained infilling models are: the Code Llama models CodeLlama-7b and CodeLlama-13b and the Code Llama - Instruct models CodeLlama-7b-Instruct, CodeLlama-13b-Instruct.

Fine-tuned Instruction Models

Code Llama - Instruct models are fine-tuned to follow instructions. To get the expected features and performance for them, a specific formatting defined in chat_completion needs to be followed, including the INST and <<SYS>> tags, BOS and EOS tokens, and the whitespaces and linebreaks in between (we recommend calling strip() on inputs to avoid double-spaces). You can use chat_completion directly to generate answers with the instruct model.

You can also deploy additional classifiers for filtering out inputs and outputs that are deemed unsafe. See the llama-recipes repo for an example of how to add a safety checker to the inputs and outputs of your inference code.

Examples using CodeLlama-7b-Instruct:

torchrun --nproc_per_node 1 example_instructions.py \
    --ckpt_dir CodeLlama-7b-Instruct/ \
    --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

Fine-tuned instruction-following models are: the Code Llama - Instruct models CodeLlama-7b-Instruct, CodeLlama-13b-Instruct, CodeLlama-34b-Instruct.

Code Llama is a new technology that carries potential risks with use. Testing conducted to date has not — and could not — cover all scenarios. In order to help developers address these risks, we have created the Responsible Use Guide. More details can be found in our research papers as well.

Issues

Please report any software “bug”, or other problems with the models through one of the following means:

Model Card

See MODEL_CARD.md for the model card of Code Llama.

References

  1. Code Llama Research Paper
  2. Code Llama Blog Post

from https://github.com/facebookresearch/codellama 

---------------------------------

Code Llama 是从 Llama-2 基础模型微调而来,有基础版(Code Llama)、Python微调版(Code Llama-Python)、以及自然语言指令微调版(Code Llama-Instruct)共 3 个版本

3 个版本的模型尺寸分别有 7B、13B 和 34B,每个模型都被喂进了 5000 亿 token 的代码及代码相关数据中训练

Meta希望Code Llama能激发大众对于Llama 2的进一步开发,成为研究和商业产品创建新的创造性工具

Features

:white_small_square:支持10万 token 上下文(可以直接塞进整个项目)

:white_small_square:支持 Python、C++、Java、PHP、Typescript(Javascript)、SQL、C#和Bash等语言

:white_small_square:Python 34B 版本在HumanEval上得分为 53.7%,在 MBPP上得分为56.2%,超过了 GPT-3.5 的 48.1% 和 52.2%(评分

:white_small_square:开源可商用

令人惊喜的是,Code Llama还有一个没有公布的「unnatural」版本,性能已经超过ChatGPT,逼近GPT-4
参考1 13 | 参考2 3

------------------------

Meta 开源编程大模型「Code Llama」,性能直逼 GPT-4

Introducing Code Llama, a state-of-the-art large language model for coding

Takeaways

  • Code Llama is a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts.
  • Code Llama is free for research and commercial use.
  • Code Llama is built on top of Llama 2 and is available in three models:
    • Code Llama, the foundational code model;
    • Codel Llama - Python specialized for Python;
    • and Code Llama - Instruct, which is fine-tuned for understanding natural language instructions.
  • In our own benchmark testing, Code Llama outperformed state-of-the-art publicly available LLMs on code tasks.


  • Code Llama research paper

  • Code Llama GitHub
  • Download the Code Llama model
  • Today, we are releasing Code Llama, a large language model (LLM) that can use text prompts to generate code. Code Llama is state-of-the-art for publicly available LLMs on code tasks, and has the potential to make workflows faster and more efficient for current developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more robust, well-documented software.

    The generative AI space is evolving rapidly, and we believe an open approach to today’s AI is the best one for developing new AI tools that are innovative, safe, and responsible. We are releasing Code Llama under the same community license as Llama 2.

    How Code Llama works

    Code Llama is a code-specialized version of Llama 2 that was created by further training Llama 2 on its code-specific datasets, sampling more data from that same dataset for longer. Essentially, Code Llama features enhanced coding capabilities, built on top of Llama 2. It can generate code, and natural language about code, from both code and natural language prompts (e.g., “Write me a function that outputs the fibonacci sequence.”) It can also be used for code completion and debugging. It supports many of the most popular languages being used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash.



    We are releasing three sizes of Code Llama with 7B, 13B, and 34B parameters respectively. Each of these models is trained with 500B tokens of code and code-related data. The 7B and 13B base and instruct models have also been trained with fill-in-the-middle (FIM) capability, allowing them to insert code into existing code, meaning they can support tasks like code completion right out of the box.

    The three models address different serving and latency requirements. The 7B model, for example, can be served on a single GPU. The 34B model returns the best results and allows for better coding assistance, but the smaller 7B and 13B models are faster and more suitable for tasks that require low latency, like real-time code completion.



    The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.

    Aside from being a prerequisite for generating longer programs, having longer input sequences unlocks exciting new use cases for a code LLM. For example, users can provide the model with more context from their codebase to make the generations more relevant. It also helps in debugging scenarios in larger codebases, where staying on top of all code related to a concrete issue can be challenging for developers. When developers are faced with debugging a large chunk of code they can pass the entire length of the code into the model.


    Additionally, we have further fine-tuned two additional variations of Code Llama: Code Llama - Python and Code Llama - Instruct.

    Code Llama - Python is a language-specialized variation of Code Llama, further fine-tuned on 100B tokens of Python code. Because Python is the most benchmarked language for code generation – and because Python and PyTorch play an important role in the AI community – we believe a specialized model provides additional utility.

    Code Llama - Instruct is an instruction fine-tuned and aligned variation of Code Llama. Instruction tuning continues the training process, but with a different objective. The model is fed a “natural language instruction” input and the expected output. This makes it better at understanding what humans expect out of their prompts. We recommend using Code Llama - Instruct variants whenever using Code Llama for code generation since Code Llama - Instruct has been fine-tuned to generate helpful and safe answers in natural language.

    We do not recommend using Code Llama or Code Llama - Python to perform general natural language tasks since neither of these models are designed to follow natural language instructions. Code Llama is specialized for code-specific tasks and isn’t appropriate as a foundation model for other tasks.

    When using the Code Llama models, users must abide by our license and acceptable use policy.


    Evaluating Code Llama’s performance

    To test Code Llama’s performance against existing solutions, we used two popular coding benchmarks: HumanEval and Mostly Basic Python Programming (MBPP). HumanEval tests the model’s ability to complete code based on docstrings and MBPP tests the model’s ability to write code based on a description.

    Our benchmark testing showed that Code Llama performed better than open-source, code-specific LLMs and outperformed Llama 2. Code Llama 34B, for example, scored 53.7% on HumanEval and 56.2% on MBPP, the highest compared with other state-of-the-art open solutions, and on par with ChatGPT.



    As with all cutting edge technology, Code Llama comes with risks. Building AI models responsibly is crucial, and we undertook numerous safety measures before releasing Code Llama. As part of our red teaming efforts, we ran a quantitative evaluation of Code Llama’s risk of generating malicious code. We created prompts that attempted to solicit malicious code with clear intent and scored Code Llama’s responses to those prompts against ChatGPT’s (GPT3.5 Turbo). Our results found that Code Llama answered with safer responses.

    Details about our red teaming efforts from domain experts in responsible AI, offensive security engineering, malware development, and software engineering are available in our research paper.

    Releasing Code Llama

    Programmers are already using LLMs to assist in a variety of tasks, ranging from writing new software to debugging existing code. The goal is to make developer workflows more efficient, so they can focus on the most human centric aspects of their job, rather than repetitive tasks.

    At Meta, we believe that AI models, but LLMs for coding in particular, benefit most from an open approach, both in terms of innovation and safety. Publicly available, code-specific models can facilitate the development of new technologies that improve peoples' lives. By releasing code models like Code Llama, the entire community can evaluate their capabilities, identify issues, and fix vulnerabilities.

    Code Llama’s training recipes are available on our Github repository.

    Model weights are also available.

    Responsible use

    Our research paper discloses details of Code Llama’s development as well as how we conducted our benchmarking tests. It also provides more information into the model’s limitations, known challenges we encountered, mitigations we’ve taken, and future challenges we intend to investigate.

    We’ve also updated our Responsible Use Guide and it includes guidance on developing downstream models responsibly, including:

    • Defining content policies and mitigations.
    • Preparing data.
    • Fine-tuning the model.
    • Evaluating and improving performance.
    • Addressing input- and output-level risks.
    • Building transparency and reporting mechanisms in user interactions.

    Developers should evaluate their models using code-specific evaluation benchmarks and perform safety studies on code-specific use cases such as generating malware, computer viruses, or malicious code. We also recommend leveraging safety datasets for automatic and human evaluations, and red teaming on adversarial prompts.

    The future of generative AI for coding

    Code Llama is designed to support software engineers in all sectors – including research, industry, open source projects, NGOs, and businesses. But there are still many more use cases to support than what our base and instruct models can serve.

    We hope that Code Llama will inspire others to leverage Llama 2 to create new innovative tools for research and commercial products.

    Try Code Llama today

    Download the Code Llama Model

    Read the research paper

     -------------------------------------------------------------------

    Ollama本地部署llama3结合open-webui使用AI大模型

    Llama 3是由Meta(Facebook) AI发布的一个开源语言模型系列,包括一个8B(十进制的80亿)模型和一个70B(十进制的700亿)模型。Llama 3支持多种商业和研究用途,并在多个行业标准测试中展示了其卓越的性能。

    Llama 3采用了优化的自回归Transformer架构,这种架构专为处理复杂的文本生成任务设计,能够有效提升生成文本的连贯性和相关性。模型结合了监督式微调(SFT)和带人类反馈的强化学习(RLHF),这种混合方法不仅增强了模型的帮助性,也提高了安全性,使得模型在实际应用中更加可靠和符合用户预期。

    一、安装ollama

    ollama官网:https://ollama.com/

    根据自己的系统类型下载Ollama
    下载地址:https://ollama.com/download

    下载以后直接双击运行安装即可,安装过程全部默认。

    二、通过ollama下载AI模型

    打开ollama的模型的网页:https://ollama.com/library

    找到llama3,双击进入,复制安装命令(如果需要其他ai模型同理操作)

    ollama run llama3

    按下“win”+“R”,输入“cmd”回车,右键粘贴刚才的命令,再次回车开始下载安装llama3。

    其他ollama常用的命令有

      serve       Start ollama
      create      Create a model from a Modelfile
      show        Show information for a model
      run         Run a model
      pull        Pull a model from a registry
      push        Push a model to a registry
      list        List models
      cp          Copy a model
      rm          Remove a model
      help        Help about any command
    

     

    看到下面的提示就说明安装完成了,之后直接输入你需要的问题就可以开始对话了。

    之后再次需要对话的时候再CMD命令行中输入ollama run llama3,就可以开启对话了。

    这样在CMD命令行中使用ollama模型非常不方便,我需要安装结合open-webui或lobe-chat等开源免费的WEB界面使用。

    三、open-webui下载安装使用

    1.启动微软Hyper-V,打开“控制面板->程序->启用或关闭Windows功能”

    • 勾选Hyper-V选项

    • 重启电脑后安装成功

    2.安装docker环境。

    docker官网:https://docker.com/

    下载docker desktop
    下载地址:https://docs.docker.com/desktop/install/windows-install/

    下载以后,直接双击运行exe文件,注意去掉“Use WSL2 instead of Hyper-V(recommended)”的勾选,否则会带来很多问题(踩坑的经验)。

    等待安装

    安装完成,点击“Close and restart”重启计算机。

    系统重启后。双击运行桌面的“Docker Desktop”图标,弹窗点击“Accept”。

    点击:“Continue without signing in”,不登录进入。

    点击:“Skip survey”

    进入到了Docker Desktop界面。

    切换国内源(设置⚙—Docker Engine),粘贴下面的内容,点击“Apply&restart”保存并重启Docker

    {
      "registry-mirrors": [
        "https://82m9ar63.mirror.aliyuncs.com",
        "http://hub-mirror.c.163.com",
        "https://docker.mirrors.ustc.edu.cn"
      ],
      "builder": {
        "gc": {
          "defaultKeepStorage": "20GB",
          "enabled": true
        }
      },
      "experimental": false,
      "features": {
        "buildkit": true
      }
    }
    

     

    3.Docker安装open-webui服务。

    按下“win”+“R”,输入“cmd”回车,右键粘贴下面的命令,回车。

    docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

    这个安装过程根据你的网络环境情况,时间可能会比较长,请耐心等待,直到出现下面的界面。

    再次打开Docker Desktop界面,就可以看到已经在运行的open-webui服务。

    点击“port(s)下面的端口号就可以大打开“open-webui服务”的网页了。

    点击“Sign up”,打开注册页面,输入注册信息,点击“Create Account”,注册第一个注册的就是管理员。

    注册后完成,点击右上角的设置⚙——Geeral——Language,选择chinese即可切换中文。

    之后可以看到直接可以选择之前部署好的llama3:8b模型,开始使用吧。

     

     

    No comments:

    Post a Comment