It's intended for SQL generation and similar with cheap fine tuning and inference, not answering general knowledge questions. Their blog post is pretty clear about that. If you just want a chatbot this isn't the model for you. If you want to let non-SQL trained people ask questions of your data, it might be really useful.
Sorry, it sounds like you know a lot more than I do about this, and I'd appreciate it if you'd connect the dots. Is your comment a dig at either Snowflake or Llama? Where are you finding the unquantized size of Llama 3 70B? Isn't it extremely rare to do inference with large unquantized models?
For decent performance, you need to keep all the parameters on memory for both. Well, with a raid-0 of two PCIe 5 SSDs (or 4 PCIe 4) you might get 1 t/s loading experts from disk on snowflake-artic... but that is slooow.
It's a statistical model of language. If it wasn't trained on text that says "I don't know that", then it's not going to produce that text. You need to use tools that can look at the logits produced and see if you're getting a confident answer or noise.
I just asked it an economics question and asked it to cite its sources.
All the links provided as sources were complete BS.
Color me unimpressed.