一分钟部署 Llama3 中文大模型，没别的，就是快

llama3 · 浏览次数 : 0

小编点评

## Deploying the Llama3-8B-Chinese-Chat model in 3 minutes **Here's how you can deploy the latest Llama3-8B-Chinese-Chat model in 3 minutes:** **1. Access the model deployment page:** - Go to **bja.sealos.run** and enter the template name as **llama3-8b-chinese** - Click "Right" above the template and select "Deploy Application" **2. Deploy the model:** - Click "Deployment Details" to monitor the progress - Once the instance status is "running", click "Application Details" **3. Access the model:** - Click "Apply" to access the deployed model - You'll receive the model's API URL, which you can copy **4. Use the model in 3 ways:** - **Web UI:** Use Lobe Chat, ChatGPT Next Web, or another Web UI. - **Terminal API:** Use the model's API directly in the terminal. - **Visual UI:** Open the model's visual interface. **Here's an example URL for Lobe Chat:** **OPENAI_PROXY_URL=your_internal_api_url_here/v1** **OPENAI_MODEL_LIST=+Llama3-8B-Chinese-Chat.q4_k_m.GGUF** **OPENAI_API_KEY=your_random_api_key** **Remember to replace the values with your actual information.** **Enjoy using the new Llama3-8B-Chinese-Chat model!**

正文

前段时间百度创始人李彦宏信誓旦旦地说开源大模型会越来越落后，闭源模型会持续领先。随后小扎同学就给了他当头一棒，向他展示了什么叫做顶级开源大模型。

美国当地时间4月18日，Meta 在官网上发布了两款开源大模型，参数分别达到 80 亿 (8B) 和 700 亿 (70B)，是目前同体量下性能最好的开源模型，而且直接逼近了一线顶级商业模型 GPT-4 和 Claude3。

与此同时，还有一个 400B 的超大杯模型还在路上，估计很快就会放出来，到时候就真的碾压了，某些声称闭源遥遥领先的哥们就等着哭吧 😢

虽然才过去短短几日，Huggingface 上已经涌现了非常多的 Llama3 中文微调版，令人眼花缭乱：

想不想自己也部署一个 Llama3 中文版？

对于没有 GPU 的同学，我们可以使用微调的量化模型来使用 CPU 运行。不同的量化方法会带来不同的性能损失：

8bit 量化没有性能损失。
AWQ 4bit 量化对 8B 模型来说有 2%性能损失，对 70B 模型只有 0.05%性能损失。
参数越大的模型，低 bit 量化损失越低。AWQ 3bit 70B 也只有 2.7%性能损失，完全可接受。

综合来说，如果追求无任何性能损失，8B 模型用 8bit 量化，70B 模型用 4bit 量化。

如果能接受 2-3%损失，8B 模型用 4bit 量化，70B 模型用 3bit 量化。

目前效果最好的中文微调版是 HuggingFace 社区的 zhouzr/Llama3-8B-Chinese-Chat-GGUF 模型，该模型采用 firefly-train-1.1M、moss-003-sft-data、school_math_0.25M、弱智吧（没错，就是那个弱智吧~）数据集，使模型能够使用中文回答用户的提问。

下面我们来看看如何在三分钟内快速部署这个模型吧。

直接在浏览器中打开以下链接：