f2CBVx

ppt.cc/fVjECx ppt.cc/fEnHsx ppt.cc/fRZTnx ppt.cc/fSZ3cx ppt.cc/fLOuCx ppt.cc/fE9Nux ppt.cc/fL5Kyx ppt.cc/fIr1ax ppt.cc/f71Yqx tecmint.com linuxcool.com linux.die.net linux.it.net.cn ostechnix.com unix.com ubuntugeek.com runoob.com man.linuxde.net ppt.cc/fwpCex ppt.cc/fxcLIx ppt.cc/foX6Ux linuxprobe.com linuxtechi.com howtoforge.com linuxstory.org systutorials.com ghacks.net linuxopsys.com ppt.cc/ffAGfx ppt.cc/fJbezx ppt.cc/fNIQDx ppt.cc/fCSllx ppt.cc/fybDVx ppt.cc/fIMQxx ppt.cc/fKlBax

Saturday, 20 January 2024

Code Llama

Inference code for CodeLlama models.

Code Llama is a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B and 13B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama was developed by fine-tuning Llama 2 using a higher sampling of code. As with Llama 2, we applied considerable safety mitigations to the fine-tuned versions of the model. For detailed information on model training, architecture and parameters, evaluations, responsible AI and safety refer to our research paper. Output generated by code generation features of the Llama Materials, including Code Llama, may be subject to third party licenses, including, without limitation, open source licenses.

We are unlocking the power of large language models and our latest version of Code Llama is now accessible to individuals, creators, researchers and businesses of all sizes so that they can experiment, innovate and scale their ideas responsibly. This release includes model weights and starting code for pretrained and fine-tuned Llama language models — ranging from 7B to 34B parameters.

This repository is intended as a minimal example to load Code Llama models and run inference.

Download

In order to download the model weights and tokenizers, please visit the Meta website and accept our License.

Once your request is approved, you will receive a signed URL over email. Then run the download.sh script, passing the URL provided when prompted to start the download. Make sure that you copy the URL text itself, do not use the 'Copy link address' option when you right click the URL. If the copied URL text starts with: https://download.llamameta.net, you copied it correctly. If the copied URL text starts with: https://l.facebook.com, you copied it the wrong way.

Pre-requisites: make sure you have wget and md5sum installed. Then to run the script: bash download.sh.

Keep in mind that the links expire after 24 hours and a certain amount of downloads. If you start seeing errors such as 403: Forbidden, you can always re-request a link.

Model sizes

Model	Size
7B	~12.55GB
13B	24GB
34B	63GB

Setup

In a conda env with PyTorch / CUDA available, clone the repo and run in the top-level directory:

pip install -e .

Inference

Different models require different model-parallel (MP) values:

Model	MP
7B	1
13B	2
34B	4

All models support sequence lengths up to 100,000 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware and use-case.

Pretrained Code Models

The Code Llama and Code Llama - Python models are not fine-tuned to follow instructions. They should be prompted so that the expected answer is the natural continuation of the prompt.

See example_completion.py for some examples. To illustrate, see command below to run it with the CodeLlama-7b model (nproc_per_node needs to be set to the MP value):

torchrun --nproc_per_node 1 example_completion.py \
    --ckpt_dir CodeLlama-7b/ \
    --tokenizer_path CodeLlama-7b/tokenizer.model \
    --max_seq_len 128 --max_batch_size 4

Pretrained code models are: the Code Llama models CodeLlama-7b, CodeLlama-13b, CodeLlama-34b and the Code Llama - Python models CodeLlama-7b-Python, CodeLlama-13b-Python, CodeLlama-34b-Python.

Code Infilling

Code Llama and Code Llama - Instruct 7B and 13B models are capable of filling in code given the surrounding context.

See example_infilling.py for some examples. The CodeLlama-7b model can be run for infilling with the command below (nproc_per_node needs to be set to the MP value):

torchrun --nproc_per_node 1 example_infilling.py \
    --ckpt_dir CodeLlama-7b/ \
    --tokenizer_path CodeLlama-7b/tokenizer.model \
    --max_seq_len 192 --max_batch_size 4

Pretrained infilling models are: the Code Llama models CodeLlama-7b and CodeLlama-13b and the Code Llama - Instruct models CodeLlama-7b-Instruct, CodeLlama-13b-Instruct.

Fine-tuned Instruction Models

Code Llama - Instruct models are fine-tuned to follow instructions. To get the expected features and performance for them, a specific formatting defined in chat_completion needs to be followed, including the INST and <<SYS>> tags, BOS and EOS tokens, and the whitespaces and linebreaks in between (we recommend calling strip() on inputs to avoid double-spaces). You can use chat_completion directly to generate answers with the instruct model.

You can also deploy additional classifiers for filtering out inputs and outputs that are deemed unsafe. See the llama-recipes repo for an example of how to add a safety checker to the inputs and outputs of your inference code.

Examples using CodeLlama-7b-Instruct:

torchrun --nproc_per_node 1 example_instructions.py \
    --ckpt_dir CodeLlama-7b-Instruct/ \
    --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

Fine-tuned instruction-following models are: the Code Llama - Instruct models CodeLlama-7b-Instruct, CodeLlama-13b-Instruct, CodeLlama-34b-Instruct.

Code Llama is a new technology that carries potential risks with use. Testing conducted to date has not — and could not — cover all scenarios. In order to help developers address these risks, we have created the Responsible Use Guide. More details can be found in our research papers as well.

Issues

Please report any software “bug”, or other problems with the models through one of the following means:

Reporting issues with the model: github.com/facebookresearch/codellama
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
Reporting bugs and security concerns: facebook.com/whitehat/info

Model Card

See MODEL_CARD.md for the model card of Code Llama.

References

from https://github.com/facebookresearch/codellama

---------------------------------

Code Llama 是从 Llama-2 基础模型微调而来，有基础版（Code Llama）、Python微调版（Code Llama-Python）、以及自然语言指令微调版（Code Llama-Instruct）共 3 个版本

3 个版本的模型尺寸分别有 7B、13B 和 34B，每个模型都被喂进了 5000 亿 token 的代码及代码相关数据中训练

Meta希望Code Llama能激发大众对于Llama 2的进一步开发，成为研究和商业产品创建新的创造性工具

Features

支持10万 token 上下文（可以直接塞进整个项目）

支持 Python、C++、Java、PHP、Typescript（Javascript）、SQL、C#和Bash等语言

Python 34B 版本在HumanEval上得分为 53.7%，在 MBPP上得分为56.2%，超过了 GPT-3.5 的 48.1% 和 52.2%（评分）

开源可商用

令人惊喜的是，Code Llama还有一个没有公布的「unnatural」版本，性能已经超过ChatGPT，逼近GPT-4
参考1 13 | 参考2 3

------------------------

Meta 开源编程大模型「Code Llama」，性能直逼 GPT-4

Introducing Code Llama, a state-of-the-art large language model for coding

Takeaways

Code Llama is a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts.
Code Llama is free for research and commercial use.
Code Llama is built on top of Llama 2 and is available in three models:
- Code Llama, the foundational code model;
- Codel Llama - Python specialized for Python;
- and Code Llama - Instruct, which is fine-tuned for understanding natural language instructions.
In our own benchmark testing, Code Llama outperformed state-of-the-art publicly available LLMs on code tasks.

Code Llama research paper

Code Llama GitHub

Download the Code Llama model

Today, we are releasing Code Llama, a large language model (LLM) that can use text prompts to generate code. Code Llama is state-of-the-art for publicly available LLMs on code tasks, and has the potential to make workflows faster and more efficient for current developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more robust, well-documented software.

The generative AI space is evolving rapidly, and we believe an open approach to today’s AI is the best one for developing new AI tools that are innovative, safe, and responsible. We are releasing Code Llama under the same community license as Llama 2.

How Code Llama works

Code Llama is a code-specialized version of Llama 2 that was created by further training Llama 2 on its code-specific datasets, sampling more data from that same dataset for longer. Essentially, Code Llama features enhanced coding capabilities, built on top of Llama 2. It can generate code, and natural language about code, from both code and natural language prompts (e.g., “Write me a function that outputs the fibonacci sequence.”) It can also be used for code completion and debugging. It supports many of the most popular languages being used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash.

We are releasing three sizes of Code Llama with 7B, 13B, and 34B parameters respectively. Each of these models is trained with 500B tokens of code and code-related data. The 7B and 13B base and instruct models have also been trained with fill-in-the-middle (FIM) capability, allowing them to insert code into existing code, meaning they can support tasks like code completion right out of the box.

The three models address different serving and latency requirements. The 7B model, for example, can be served on a single GPU. The 34B model returns the best results and allows for better coding assistance, but the smaller 7B and 13B models are faster and more suitable for tasks that require low latency, like real-time code completion.

The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.

Aside from being a prerequisite for generating longer programs, having longer input sequences unlocks exciting new use cases for a code LLM. For example, users can provide the model with more context from their codebase to make the generations more relevant. It also helps in debugging scenarios in larger codebases, where staying on top of all code related to a concrete issue can be challenging for developers. When developers are faced with debugging a large chunk of code they can pass the entire length of the code into the model.

Additionally, we have further fine-tuned two additional variations of Code Llama: Code Llama - Python and Code Llama - Instruct.

Code Llama - Python is a language-specialized variation of Code Llama, further fine-tuned on 100B tokens of Python code. Because Python is the most benchmarked language for code generation – and because Python and PyTorch play an important role in the AI community – we believe a specialized model provides additional utility.

Code Llama - Instruct is an instruction fine-tuned and aligned variation of Code Llama. Instruction tuning continues the training process, but with a different objective. The model is fed a “natural language instruction” input and the expected output. This makes it better at understanding what humans expect out of their prompts. We recommend using Code Llama - Instruct variants whenever using Code Llama for code generation since Code Llama - Instruct has been fine-tuned to generate helpful and safe answers in natural language.

We do not recommend using Code Llama or Code Llama - Python to perform general natural language tasks since neither of these models are designed to follow natural language instructions. Code Llama is specialized for code-specific tasks and isn’t appropriate as a foundation model for other tasks.

When using the Code Llama models, users must abide by our license and acceptable use policy.

Evaluating Code Llama’s performance

To test Code Llama’s performance against existing solutions, we used two popular coding benchmarks: HumanEval and Mostly Basic Python Programming (MBPP). HumanEval tests the model’s ability to complete code based on docstrings and MBPP tests the model’s ability to write code based on a description.

Our benchmark testing showed that Code Llama performed better than open-source, code-specific LLMs and outperformed Llama 2. Code Llama 34B, for example, scored 53.7% on HumanEval and 56.2% on MBPP, the highest compared with other state-of-the-art open solutions, and on par with ChatGPT.

As with all cutting edge technology, Code Llama comes with risks. Building AI models responsibly is crucial, and we undertook numerous safety measures before releasing Code Llama. As part of our red teaming efforts, we ran a quantitative evaluation of Code Llama’s risk of generating malicious code. We created prompts that attempted to solicit malicious code with clear intent and scored Code Llama’s responses to those prompts against ChatGPT’s (GPT3.5 Turbo). Our results found that Code Llama answered with safer responses.

Details about our red teaming efforts from domain experts in responsible AI, offensive security engineering, malware development, and software engineering are available in our research paper.

Releasing Code Llama

Programmers are already using LLMs to assist in a variety of tasks, ranging from writing new software to debugging existing code. The goal is to make developer workflows more efficient, so they can focus on the most human centric aspects of their job, rather than repetitive tasks.

At Meta, we believe that AI models, but LLMs for coding in particular, benefit most from an open approach, both in terms of innovation and safety. Publicly available, code-specific models can facilitate the development of new technologies that improve peoples' lives. By releasing code models like Code Llama, the entire community can evaluate their capabilities, identify issues, and fix vulnerabilities.

Code Llama’s training recipes are available on our Github repository.

Model weights are also available.

Responsible use

Our research paper discloses details of Code Llama’s development as well as how we conducted our benchmarking tests. It also provides more information into the model’s limitations, known challenges we encountered, mitigations we’ve taken, and future challenges we intend to investigate.

We’ve also updated our Responsible Use Guide and it includes guidance on developing downstream models responsibly, including:

Defining content policies and mitigations.
Preparing data.
Fine-tuning the model.
Evaluating and improving performance.
Addressing input- and output-level risks.
Building transparency and reporting mechanisms in user interactions.

Developers should evaluate their models using code-specific evaluation benchmarks and perform safety studies on code-specific use cases such as generating malware, computer viruses, or malicious code. We also recommend leveraging safety datasets for automatic and human evaluations, and red teaming on adversarial prompts.

The future of generative AI for coding

Code Llama is designed to support software engineers in all sectors – including research, industry, open source projects, NGOs, and businesses. But there are still many more use cases to support than what our base and instruct models can serve.

We hope that Code Llama will inspire others to leverage Llama 2 to create new innovative tools for research and commercial products.

Try Code Llama today

Code Llama GitHub repository

Download the Code Llama Model

Read the research paper

Code Llama: Open foundation models for code

-------------------------------------------------------------------

Ollama本地部署llama3结合open-webui使用AI大模型

Llama 3是由Meta(Facebook) AI发布的一个开源语言模型系列，包括一个8B（十进制的80亿）模型和一个70B（十进制的700亿）模型。Llama 3支持多种商业和研究用途，并在多个行业标准测试中展示了其卓越的性能。

Llama 3采用了优化的自回归Transformer架构，这种架构专为处理复杂的文本生成任务设计，能够有效提升生成文本的连贯性和相关性。模型结合了监督式微调（SFT）和带人类反馈的强化学习（RLHF），这种混合方法不仅增强了模型的帮助性，也提高了安全性，使得模型在实际应用中更加可靠和符合用户预期。

一、安装ollama

ollama官网：https://ollama.com/

根据自己的系统类型下载Ollama
下载地址：https://ollama.com/download

下载以后直接双击运行安装即可，安装过程全部默认。

二、通过ollama下载AI模型

打开ollama的模型的网页：https://ollama.com/library

找到llama3，双击进入，复制安装命令（如果需要其他ai模型同理操作）

ollama run llama3

按下“win”+“R”，输入“cmd”回车，右键粘贴刚才的命令，再次回车开始下载安装`llama3。`

其他`ollama`常用的命令有

  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

看到下面的提示就说明安装完成了，之后直接输入你需要的问题就可以开始对话了。

之后再次需要对话的时候再CMD命令行中输入ollama run llama3，就可以开启对话了。

这样在CMD命令行中使用ollama模型非常不方便，我需要安装结合open-webui或lobe-chat等开源免费的WEB界面使用。

三、open-webui下载安装使用

1.启动微软Hyper-V，打开“控制面板->程序->启用或关闭Windows功能”

勾选Hyper-V选项
重启电脑后安装成功

2.安装docker环境。

docker官网：https://docker.com/

下载docker desktop
下载地址：https://docs.docker.com/desktop/install/windows-install/

下载以后，直接双击运行exe文件，注意去掉“Use WSL2 instead of Hyper-V（recommended）”的勾选，否则会带来很多问题（踩坑的经验）。

等待安装

安装完成，点击“Close and restart”重启计算机。

系统重启后。双击运行桌面的“Docker Desktop”图标，弹窗点击“Accept”。

点击：“Continue without signing in”，不登录进入。

点击：“Skip survey”

进入到了Docker Desktop界面。

切换国内源（设置⚙—Docker Engine），粘贴下面的内容，点击“Apply&restart”保存并重启Docker

{
  "registry-mirrors": [
    "https://82m9ar63.mirror.aliyuncs.com",
    "http://hub-mirror.c.163.com",
    "https://docker.mirrors.ustc.edu.cn"
  ],
  "builder": {
    "gc": {
      "defaultKeepStorage": "20GB",
      "enabled": true
    }
  },
  "experimental": false,
  "features": {
    "buildkit": true
  }
}

3.Docker安装open-webui服务。

按下“win”+“R”，输入“cmd”回车，右键粘贴下面的命令，回车。

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

这个安装过程根据你的网络环境情况，时间可能会比较长，请耐心等待，直到出现下面的界面。

再次打开Docker Desktop界面，就可以看到已经在运行的open-webui服务。

点击“port(s)下面的端口号就可以大打开“open-webui服务”的网页了。

点击“Sign up”，打开注册页面，输入注册信息，点击“Create Account”，注册第一个注册的就是管理员。

注册后完成，点击右上角的设置⚙——Geeral——Language，选择chinese即可切换中文。

之后可以看到直接可以选择之前部署好的llama3:8b模型，开始使用吧。

from http://web.archive.org/web/20240613014428/https://www.sunweihu.com/8838.html

--------------------------------------

Llama 3 大模型开源了！

Llama 3 是 Meta 发布的最新大型语言模型，旨在让个人、创作者、研究人员和各种规模的企业能够负责任地试验、创新和扩展他们的想法。

相比于之前发布的开源模型， Llama 3 的特性是：

    数据量：训练的数据是 Llama 2数据集的 7 倍多

    能力增强：推理和代码能力增强

    训练效率：比 Llama2 高 3 倍；

    模型大小：提供从 8B 到 70B 参数的不同大小的预训练和指令调整的 Llama 3语言模型

    下载和使用：提供了模型权重和分词器的下载指南，以及如何在本地运行模型的快速入门步骤

    支持模型并行：不同大小的模型需要不同的模型并行（MP）值

    许可证: 模型和权重对研究人员和商业实体开放，旨在促进发现和道德的AI进步

开源地址：https://github.com/meta-llama/llama3

( The official Meta Llama 3 GitHub site

Models on Hugging Face | Blog | Website | Get Started

Meta Llama 3

We are unlocking the power of large language models. Our latest version of Llama is now accessible to individuals, creators, researchers, and businesses of all sizes so that they can experiment, innovate, and scale their ideas responsibly.

This release includes model weights and starting code for pre-trained and instruction-tuned Llama 3 language models — including sizes of 8B to 70B parameters.

This repository is a minimal example of loading Llama 3 models and running inference. For more detailed examples, see llama-recipes.

Download

To download the model weights and tokenizer, please visit the Meta Llama website and accept our License.

Once your request is approved, you will receive a signed URL over email. Then, run the download.sh script, passing the URL provided when prompted to start the download.

Pre-requisites: Ensure you have wget and md5sum installed. Then run the script: ./download.sh.

Remember that the links expire after 24 hours and a certain amount of downloads. You can always re-request a link if you start seeing errors such as 403: Forbidden.

Access to Hugging Face

We also provide downloads on Hugging Face, in both transformers and native llama3 formats. To download the weights from Hugging Face, please follow these steps:

Visit one of the repos, for example meta-llama/Meta-Llama-3-8B-Instruct.
Read and accept the license. Once your request is approved, you'll be granted access to all the Llama 3 models. Note that requests used to take up to one hour to get processed.
To download the original native weights to use with this repo, click on the "Files and versions" tab and download the contents of the original folder. You can also download them from the command line if you pip install huggingface-hub:

huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --include "original/*" --local-dir meta-llama/Meta-Llama-3-8B-Instruct

Quick Start

You can follow the steps below to get up and running with Llama 3 models quickly. These steps will let you run quick inference locally. For more examples, see the Llama recipes repository.

Clone and download this repository in a conda env with PyTorch / CUDA.
In the top-level directory run:
```
pip install -e .
```

Visit the Meta Llama website and register to download the model/s.
Once registered, you will get an email with a URL to download the models. You will need this URL when you run the download.sh script.
Once you get the email, navigate to your downloaded llama repository and run the download.sh script.
- Make sure to grant execution permissions to the download.sh script
- During this process, you will be prompted to enter the URL from the email.
- Do not use the “Copy Link” option; copy the link from the email manually.
Once the model/s you want have been downloaded, you can run the model locally using the command below:

torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir Meta-Llama-3-8B-Instruct/ \
    --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 6

Note

Replace Meta-Llama-3-8B-Instruct/ with the path to your checkpoint directory and Meta-Llama-3-8B-Instruct/tokenizer.model with the path to your tokenizer model.
The –nproc_per_node should be set to the MP value for the model you are using.
Adjust the max_seq_len and max_batch_size parameters as needed.
This example runs the example_chat_completion.py found in this repository, but you can change that to a different .py file.

Inference

Different models require different model-parallel (MP) values:

Model	MP
8B	1
70B	8

All models support sequence length up to 8192 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware.

Pretrained Models

These models are not finetuned for chat or Q&A. They should be prompted so that the expected answer is the natural continuation of the prompt.

See example_text_completion.py for some examples. To illustrate, see the command below to run it with the llama-3-8b model (nproc_per_node needs to be set to the MP value):

torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir Meta-Llama-3-8B/ \
    --tokenizer_path Meta-Llama-3-8B/tokenizer.model \
    --max_seq_len 128 --max_batch_size 4

Instruction-tuned Models

The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, specific formatting defined in ChatFormat needs to be followed: The prompt begins with a <|begin_of_text|> special token, after which one or more messages follow. Each message starts with the <|start_header_id|> tag, the role system, user or assistant, and the <|end_header_id|> tag. After a double newline \n\n, the message's contents follow. The end of each message is marked by the <|eot_id|> token.

You can also deploy additional classifiers to filter out inputs and outputs that are deemed unsafe. See the llama-recipes repo for an example of how to add a safety checker to the inputs and outputs of your inference code.

Examples using llama-3-8b-chat:

torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir Meta-Llama-3-8B-Instruct/ \
    --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 6

Llama 3 is a new technology that carries potential risks with use. Testing conducted to date has not — and could not — cover all scenarios. To help developers address these risks, we have created the Responsible Use Guide.

Issues

Please report any software “bug” or other problems with the models through one of the following means:

Reporting issues with the model: https://github.com/meta-llama/llama3/issues
Reporting risky content generated by the model: developers.facebook.com/llama_output_feedback
Reporting bugs and security concerns: facebook.com/whitehat/info

Model Card

See MODEL_CARD.md.

Questions

For common questions, the FAQ can be found here, which will be updated over time as new questions arise.

from https://github.com/meta-llama/llama3 )

------------------------------------------------------

Document | Roadmap | Twitter | Discord | Demo

📕 Table of Contents

💡 What is RAGFlow?

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.

🎮 Demo

Try our demo at https://demo.ragflow.io.

🔥 Latest Updates

2024-07-08 Supports workflow based on Graph.
2024-06-27 Supports Markdown and Docx in the Q&A parsing method.
2024-06-27 Supports extracting images from Docx files.
2024-06-27 Supports extracting tables from Markdown files.
2024-06-14 Supports PDF in the Q&A parsing method.
2024-06-06 Supports Self-RAG, which is enabled by default in dialog settings.
2024-05-30 Integrates BCE and BGE reranker models.
2024-05-28 Supports LLM Baichuan and VolcanoArk.
2024-05-23 Supports RAPTOR for better text retrieval.
2024-05-21 Supports streaming output and text chunk retrieval API.
2024-05-15 Integrates OpenAI GPT-4o.

🌟 Key Features

🍭 "Quality in, quality out"

Deep document understanding-based knowledge extraction from unstructured data with complicated formats.
Finds "needle in a data haystack" of literally unlimited tokens.

🍱 Template-based chunking

Intelligent and explainable.
Plenty of template options to choose from.

🌱 Grounded citations with reduced hallucinations

Visualization of text chunking to allow human intervention.
Quick view of the key references and traceable citations to support grounded answers.

🍔 Compatibility with heterogeneous data sources

Supports Word, slides, excel, txt, images, scanned copies, structured data, web pages, and more.

🛀 Automated and effortless RAG workflow

Streamlined RAG orchestration catered to both personal and large businesses.
Configurable LLMs as well as embedding models.
Multiple recall paired with fused re-ranking.
Intuitive APIs for seamless integration with business.

🔎 System Architecture

🎬 Get Started

📝 Prerequisites

CPU >= 4 cores
RAM >= 16 GB
Disk >= 50 GB
Docker >= 24.0.0 & Docker Compose >= v2.26.1

If you have not installed Docker on your local machine (Windows, Mac, or Linux), see Install Docker Engine.

🚀 Start up the server

Ensure vm.max_map_count >= 262144:

To check the value of vm.max_map_count:
$ sysctl vm.max_map_count

Reset vm.max_map_count to a value at least 262144 if it is not.

# In this case, we set it to 262144:
$ sudo sysctl -w vm.max_map_count=262144

This change will be reset after a system reboot. To ensure your change remains permanent, add or update the vm.max_map_count value in /etc/sysctl.conf accordingly:

vm.max_map_count=262144

Clone the repo:

$ git clone https://github.com/infiniflow/ragflow.git

Build the pre-built Docker images and start up the server:

Running the following commands automatically downloads the dev version RAGFlow Docker image. To download and run a specified Docker version, update RAGFLOW_VERSION in docker/.env to the intended version, for example RAGFLOW_VERSION=v0.8.0, before running the following commands.

$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d

The core image is about 9 GB in size and may take a while to load.

Check the server status after having the server up and running:

$ docker logs -f ragflow-server

The following output confirms a successful launch of the system:

    ____                 ______ __
   / __ \ ____ _ ____ _ / ____// /____  _      __
  / /_/ // __ `// __ `// /_   / // __ \| | /| / /
 / _, _// /_/ // /_/ // __/  / // /_/ /| |/ |/ /
/_/ |_| \__,_/ \__, //_/    /_/ \____/ |__/|__/
              /____/

 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:9380
 * Running on http://x.x.x.x:9380
 INFO:werkzeug:Press CTRL+C to quit

If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a network anomaly error because, at that moment, your RAGFlow may not be fully initialized.
In your web browser, enter the IP address of your server and log in to RAGFlow.

With the default settings, you only need to enter http://IP_OF_YOUR_MACHINE (sans port number) as the default HTTP serving port 80 can be omitted when using the default configurations.
In service_conf.yaml, select the desired LLM factory in user_default_llm and update the API_KEY field with the corresponding API key.

See llm_api_key_setup for more information.

The show is now on!

🔧 Configurations

When it comes to system configurations, you will need to manage the following files:

.env: Keeps the fundamental setups for the system, such as SVR_HTTP_PORT, MYSQL_PASSWORD, and MINIO_PASSWORD.
service_conf.yaml: Configures the back-end services.
docker-compose.yml: The system relies on docker-compose.yml to start up.

You must ensure that changes to the .env file are in line with what are in the service_conf.yaml file.

The ./docker/README file provides a detailed description of the environment settings and service configurations, and you are REQUIRED to ensure that all environment settings listed in the ./docker/README file are aligned with the corresponding configurations in the service_conf.yaml file.

To update the default HTTP serving port (80), go to docker-compose.yml and change 80:80 to <YOUR_SERVING_PORT>:80.

Updates to all system configurations require a system reboot to take effect:
$ docker-compose up -d

🛠️ Build from source

To build the Docker images from source:

$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow/
$ docker build -t infiniflow/ragflow:dev .
$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d

🛠️ Launch service from source

To launch the service from source:

Clone the repository:

$ git clone https://github.com/infiniflow/ragflow.git
$ cd ragflow/

Create a virtual environment, ensuring that Anaconda or Miniconda is installed:

$ conda create -n ragflow python=3.11.0
$ conda activate ragflow
$ pip install -r requirements.txt

# If your CUDA version is higher than 12.0, run the following additional commands:
$ pip uninstall -y onnxruntime-gpu
$ pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/

Copy the entry script and configure environment variables:

# Get the Python path:
$ which python
# Get the ragflow project path:
$ pwd

$ cp docker/entrypoint.sh .
$ vi entrypoint.sh

# Adjust configurations according to your actual situation (the following two export commands are newly added):
# - Assign the result of `which python` to `PY`.
# - Assign the result of `pwd` to `PYTHONPATH`.
# - Comment out `LD_LIBRARY_PATH`, if it is configured.
# - Optional: Add Hugging Face mirror.
PY=${PY}
export PYTHONPATH=${PYTHONPATH}
export HF_ENDPOINT=https://hf-mirror.com

Launch the third-party services (MinIO, Elasticsearch, Redis, and MySQL):

$ cd docker
$ docker compose -f docker-compose-base.yml up -d

Check the configuration files, ensuring that:

The settings in docker/.env match those in conf/service_conf.yaml.
The IP addresses and ports for related services in service_conf.yaml match the local machine IP and ports exposed by the container.

Launch the RAGFlow backend service:

$ chmod +x ./entrypoint.sh
$ bash ./entrypoint.sh

Launch the frontend service:

$ cd web
$ npm install --registry=https://registry.npmmirror.com --force
$ vim .umirc.ts
# Update proxy.target to http://127.0.0.1:9380
$ npm run dev

Deploy the frontend service:

$ cd web
$ npm install --registry=https://registry.npmmirror.com --force
$ umi build
$ mkdir -p /ragflow/web
$ cp -r dist /ragflow/web
$ apt install nginx -y
$ cp ../docker/nginx/proxy.conf /etc/nginx
$ cp ../docker/nginx/nginx.conf /etc/nginx
$ cp ../docker/nginx/ragflow.conf /etc/nginx/conf.d
$ systemctl start nginx

Total Pageviews

Saturday, 20 January 2024

Code Llama

Download

Model sizes

Setup

Inference

Pretrained Code Models

Code Infilling

Fine-tuned Instruction Models

Issues

Model Card

References

Meta 开源编程大模型「Code Llama」，性能直逼 GPT-4

-------------------------------------------------------------------

Ollama本地部署llama3结合open-webui使用AI大模型

一、安装ollama

根据自己的系统类型下载Ollama 下载地址：https://ollama.com/download

二、通过ollama下载AI模型

打开ollama的模型的网页：https://ollama.com/library

找到llama3，双击进入，复制安装命令（如果需要其他ai模型同理操作）

按下“win”+“R”，输入“cmd”回车，右键粘贴刚才的命令，再次回车开始下载安装llama3。

其他ollama常用的命令有

这样在CMD命令行中使用ollama模型非常不方便，我需要安装结合open-webui或lobe-chat等开源免费的WEB界面使用。

三、open-webui下载安装使用

1.启动微软Hyper-V，打开“控制面板->程序->启用或关闭Windows功能”

勾选Hyper-V选项

重启电脑后安装成功

2.安装docker环境。

下载docker desktop 下载地址：https://docs.docker.com/desktop/install/windows-install/

下载以后，直接双击运行exe文件，注意去掉“Use WSL2 instead of Hyper-V（recommended）”的勾选，否则会带来很多问题（踩坑的经验）。

等待安装

安装完成，点击“Close and restart”重启计算机。

系统重启后。双击运行桌面的“Docker Desktop”图标，弹窗点击“Accept”。

点击：“Continue without signing in”，不登录进入。

点击：“Skip survey”

进入到了Docker Desktop界面。

切换国内源（设置⚙—Docker Engine），粘贴下面的内容，点击“Apply&restart”保存并重启Docker

3.Docker安装open-webui服务。

按下“win”+“R”，输入“cmd”回车，右键粘贴下面的命令，回车。

这个安装过程根据你的网络环境情况，时间可能会比较长，请耐心等待，直到出现下面的界面。

再次打开Docker Desktop界面，就可以看到已经在运行的open-webui服务。

点击“port(s)下面的端口号就可以大打开“open-webui服务”的网页了。

点击“Sign up”，打开注册页面，输入注册信息，点击“Create Account”，注册第一个注册的就是管理员。

注册后完成，点击右上角的设置⚙——Geeral——Language，选择chinese即可切换中文。

之后可以看到直接可以选择之前部署好的llama3:8b模型，开始使用吧。

Download

Access to Hugging Face

Quick Start

Inference

Pretrained Models

Instruction-tuned Models

Issues

Model Card

Questions

Document | Roadmap | Twitter | Discord | Demo

💡 What is RAGFlow?

🎮 Demo

🔥 Latest Updates

🌟 Key Features

🍭 "Quality in, quality out"

🍱 Template-based chunking

🌱 Grounded citations with reduced hallucinations

🍔 Compatibility with heterogeneous data sources

🛀 Automated and effortless RAG workflow

🔎 System Architecture

🎬 Get Started

📝 Prerequisites

🚀 Start up the server

🔧 Configurations

🛠️ Build from source

🛠️ Launch service from source

📚 Documentation

📜 Roadmap

🏄 Community

No comments:

Post a Comment

根据自己的系统类型下载Ollama
下载地址：https://ollama.com/download

按下“win”+“R”，输入“cmd”回车，右键粘贴刚才的命令，再次回车开始下载安装`llama3。`

其他`ollama`常用的命令有

下载docker desktop
下载地址：https://docs.docker.com/desktop/install/windows-install/