I got this model working on a GPU instance, notes here: https://til.simonwilliso...

Garcia98 · on April 12, 2023

What's the most cost-effective alternative to Paperspace? I had a nightmarish experience with them last week after my account got locked up twice when I was training a model with a 1.5 GB dataset that somewhere contained the string "Minecraft Server".

simonw · on April 13, 2023

I picked them almost at random from the list suggested by this Fast.AI course: https://course.fast.ai/Lessons/lesson9.html#links-from-the-l...

Daegalus · on April 12, 2023

Im not an expert, and I don't have nvidia, but I assume you need to setup CUDA and install the CUDA pytorch stuff?

Most docs Ive read on setting up finetuners and inference require some extra stuff. Taking some LORA fine tuners, they include instructions like this:

  conda create -n llm-finetuner python=3.10
  conda activate llm-finetuner
  conda install -y cuda -c nvidia/label/cuda-11.7.0
  conda install -y pytorch=2 pytorch-cuda=11.7 -c pytorch

When I experimented with Stable Diffusion and ROCM (amd card), i had to do similar but with pythorch-rocm. and when I was doing a CPU only, did `pytorch-cpu`. So maybe your attempt didn't use the GPUs at all, because 12 mins is about what I had on a CPU for inference on other models of similar size.

zamnos · on April 12, 2023

The error message implies that the compiled default libraries on the M1 don't support the model format, even though it works fine in Paperspace.

    The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
 Traceback (most recent call last):
   File "/Users/fragmede/projects/llm/dolly/foo.py", line 5, in <module>
  instruct_pipeline = pipeline(
       ^^^^^^^^^
   File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/pipelines/__init__.py", line 776, in pipeline
  framework, model = infer_framework_load_model(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/pipelines/base.py", line 271, in infer_framework_load_model
  raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.")
 ValueError: Could not load model databricks/dolly-v2-12b with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.gpt_neox.modeling_gpt_neox.GPTNeoXForCausalLM'>).

Daegalus · on April 12, 2023

I was referring to his TIL post about setting it up on paperspace, not about apple hardware.

zamnos · on April 13, 2023

ah, apologies, i misread your comment and was more excited to share since I was able to try on my system.

Daegalus · on April 13, 2023

No worries, it happens. I will admit the way I answered wasn't clear that I was referring to the linked page and not the question in the post. All good.

mrtranscendence · on April 12, 2023

I attempted using the Transformers library but failed. Not sure, might be a VRAM issue; I'm going to try on my far beefier personal MacBook Pro later tonight.

rnk · on April 12, 2023

How much ram is likely needed on an apple arm for models like this? And for general use, 64, 96, 128? Trying to decide how large I should go for a new laptop.

mrtranscendence · on April 12, 2023

I very recently purchased a MacBook Pro (M1 Max) with 64GB of ram. I haven't experimented that much, but I was able to run inference using the 65B parameter Llama model with quantized weights at a speed that was reasonably usable (maybe a touch slower than ChatGPT with GPT-4).

I haven't attempted to use the 65B model with non-quantized weights, but the smaller models work that way, if slowly. With 96GB of ram -- the upper limit of a MacBook Pro -- you might be able to use even larger models, but I think you'd hit the limits of useful performance before that point.

I should note that it can be a bit tricky getting things to work using the Mac's GPU. I couldn't get Dolly 6B to run on my work MBP, which theoretically should have enough ram, though I still want to try it on my personal laptop.

rnk · on April 12, 2023

I see refurbished m1 2tb/128gb for $4700, looks like similar price for an m2 with same storage/ram with my corp discount (20cpu/48gpu). This is a tough decision.

Szpadel · on April 12, 2023

AFAIK current models can run even with 64GB, but I would assume that we will very likely have bigger models very soon so I guess the answer is as much as you can afford

rnk · on April 12, 2023

The next question is m1 or m2, and the impact of the various number of gpu units between pro, max, ultra skews. I'm really tempted to buy a "refurbished m1 studio" with 128gb because I think the ram is the key. Have not seen any benchmarks with diff # of gpus/aka diff skews.

anentropic · on April 12, 2023

I saw this: https://github.com/jankais3r/LLaMA_MPS

it runs slightly slower on the GPU than under llama.cpp but uses much less power doing so

I would guess the slowness is due to immaturity of the PyTorch MPS backend, the asitop graphs show it doing a bunch of cpu along with the gpu, so it might be inefficiently falling back to cpu for some ops and swapping layers back and forth (I have no idea, just guessing)

rnk · on April 12, 2023

Hey, thanks so much. That solidifies the case for 128gb mac studio. Apple could be selling a bunch of these things with these high ram capabilities.

zamnos · on April 12, 2023

The answer is as large as you can afford, really. Future more unoptimized models are only going to be more hungry for RAM.

aldarisbm · on April 12, 2023

same same

gavi · on April 12, 2023

Not on M1/M2 yet, but my response time seems pretty fast on Tesla V100-SXM2-16GB

brianjking · on April 12, 2023

I'm sure we'll see this by the end of the day or two.