This is a quick guide for getting Dolly running on an Ubuntu machine with Nvidia GPUs.
You’ll need a good internet connection and around 35GB of hard drive space for the Nvidia driver, Dolly (12b model) and extras. You can use the smaller models to take up less space. The 8 billion parameter model uses about ~14GB of space while the 3 billion parameter one is around 6GB
Install Nvidia Drivers and CUDA
sudo apt install nvidia-driver nvidia-cuda-toolkit
Reboot to activate the Nvidia driver
reboot
Install Python
Python should already be installed, but we do need to install pip.
Once pip is installed, then we need to install numpy, accelerate, and transformers
sudo apt install python3-pip
pip install numpy
pip install accelerate>=0.12.0 transformers[torch]==4.25.1
Run Dolly
Run a python console. If you run it as administrator, it should be faster.
python3
Run the following commands to set up Dolly.
import torch from transformers import pipeline generate_text = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto") # Alternatively, If you want to use a smaller model run generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
Notes:
- If you have issues, you may want/need to specify an offload folder with offload_folder=”.\offloadfolder”. An SSD is preferable.
- If you have lots of RAM, you can take out the “torch_dtype=torch.bfloat16”
- If you do NOT have lots of ram (>32GB), then you may only be able to run the smallest model
Alternatively, if we don’t want to trust_remote_code, we can download this file, and run the following
from instruct_pipeline import InstructionTextGenerationPipeline from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b", padding_side="left") model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b", device_map="auto") generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)
Now we can ask Dolly a question.
generate_text("Your question?")
Example:
>>> generate_text("Tell me about Databricks dolly-v2-3b?") 'Dolly is the fully managed open-source engine that allows you to rapidly build, test, and deploy machine learning models, all on your own infrastructure.'
Further information is available at the following two links.
https://github.com/databrickslabs/dolly
https://huggingface.co/databricks/dolly-v2-3b
https://huggingface.co/databricks/dolly-v2-12b