## Chinese-LLaMA-Alpaca Wiki Documentation Model ReconstructionOnline conversion with ColabManual ConversionModel Quantization, Inference and Deployment
This document provides details and instructions for the pre-training script and the manual conversion process for the Chinese-LLaMA-Alpaca model.
### Pre-training Script
This script should be run before the main model conversion. It takes the base model and LoRA model paths as input and performs the following steps:
1. Loads the base model and LoRA model weights and configurations.
2. If a PyTorch model is used, it converts it to HF format.
3. Loads the base model or pre-trained model (if available) and sets the tokenizer path.
4. Performs model quantization and loading.
5. Saves the trained model for future inference.
**Parameters:**
* `base_model`: Path to the base model weight directory.
* `lora_model`: Path to the LoRA model weight directory.
* `with_prompt`: A boolean flag indicating whether to merge the prompt with the input.
* `interactive`: A flag indicating whether to run the script interactively.
**Note:** This script assumes that the `merge_llama_with_chinese_lora_to_hf.py` script has already been executed.
### Manual Conversion
This process involves merging the LoRA weights with the base model to create the final HF model.
1. Run `merge_llama_with_chinese_lora.py` to perform this step.
2. The script will generate the complete HF model weights in the `path_to_merged_chinese_alpaca_plus` directory.
3. Load the merged model for inference using `inference_hf.py` or `gradio_demo.py`.
**Additional Notes:**
* Ensure you have sufficient memory (32GB or more) for GPU training.
* The script may not be able to reproduce the same decoding results as `llama.cpp` due to different framework implementations.
### Conclusion
This documentation provides a detailed overview of the pre-training script and the manual conversion process for the Chinese-LLaMA-Alpaca model. Follow the instructions carefully to achieve successful model training and inference.