How hard would it be to get dolly running on llama.cpp?

ankitmathur · on April 12, 2023

Hey there! I worked on Dolly, and I work on Model Serving at Databricks. DollyV1 is GPT-J-based, so it'll run easily on llama.cpp. DollyV2 is Pythia-based, which is built with the GPT-NeoX library

GPT-NeoX is not that different than GPT-J (it also has the rotary embeddings, which llama.cpp supports for GPT-J). I would imagine it's not too heavy of a lift to add NeoX architecture support

samstave · on April 13, 2023

Because the firehost of AI/GPT is a lot to try to take in, please ELI5 unpack and provide more definitions for this comment.

-

Thank you.

Just so I am clear, "parameters" refers to the number of total node-relation-connections btwn a single node and its neighbors for that Prompt/Label? Or how would you explain this ELI5 style?

ankitmathur · on April 13, 2023

Sure! I'll try to briefly summarize though almost certainly will oversimplify. There are a couple of open source language models trained by Eleuther AI - the first one was called GPT-J, and it used some newer model architecture concepts. Subsequently, they released a model architected in the likeness of GPT-3, called GPT-NeoX-20B. Functionally, it was quite similar architecturally to GPT-J, but just with more parameters. Pythia is a model with the same architecture and the same dataset but with different parameter sizes to test scaling laws.

DollyV2 is a Pythia model fine tuned on the Databricks 15K dataset

ankitmathur · on April 13, 2023

Augmenting the answer to address your followup: parameters are any trainable variable in a model's definition. Model training is a process where you basically tweak the parameters in your model and then re-evaluate the model on a metric judging its quality. A lot of models consist of matrix multiplication, so if you are multiplying matrix A of size 2x2 with matrix B of size 2x2 and both matrices can we tweaked, then you've got 8 parameters, since you've got 8 numbers that can be tweaked

anentropic · on April 12, 2023

it's probably simple for Dolly v1 (?) since it was a fine-tuned version of GPT-J

https://github.com/ggerganov/ggml/tree/master/examples/gpt-j

AFAIK there is no .cpp version of Pythia-12B yet