Hacker Newsnew | past | comments | ask | show | jobs | submit | waldfee's commentslogin

i don't think such a guide exists. this space is moving pretty fast. a short rundown

quantized model formats:

- GGML: used with llama.cpp, outdated, support is dropped or will be soon. cpu+gpu inference

- GGUF: "new version" of the GGML file format, used with llama.cpp. cpu+gpu inference. offers 2-8bit quantization

- GPTQ: pure gpu inference, used with AutoGPTQ, exllama, exllamav2, offers only 4 bit quantization

- EXL2: pure gpu inference, used with exllamav2, offers 2-8bit quantization

here[1] is a nice overview of VRAM usage vs perplexity of different quant levels (with the example of a 70b model in exl2 format)

[1] https://old.reddit.com/r/LocalLLaMA/comments/178tzps/updated...


Worth clarifying that GGML the library is very much active. GGML as a file format will be superseded by GGUF.


is everything ( for the most part ) a Llama model? does everything fork llama? is GGML part of llama? what is the relation of llama and mode formats. Is there an analogy? is GGML to react is to javascript? What is the differnence in GPT4all models vs llama.cpp vs ollama?

Thanks!


Everything (most llms and modern embedding models) is a transformer so the architecture is very similar. Llama(2) is a Meta (facebook) developed transformer plus the training they did on it.

Ggml is a "framework" like pytorch etc (for the purposes of this discussion) that lets you code up the architecture of a model, load in the weights that were trained, and run inference with it. Llama.cpp is a project that I'd describe as using ggml to implement some specific AI model architectures.


i am only dabbling in this space myself, so can't answer everything. all the formats i mentioned are for a quantized version of the original model. basically a lower resolution version, with the associated precision loss. e.g. original model weights are in f16, the gptq version is in int4. a big difference in size but often an acceptable loss of quality. using quants is basically a tradeoff between quality and "can i run it?".

examples of original models are llama(2), mistral, xwin. they are not directly related to any quantized versions. quants are mostly done by third parties (e.g. thebloke[1]).

using a full model for inference requires pretty beefy hardware. most inference on consumer hardware is done with quantized versions for that reason.

[1] https://huggingface.co/TheBloke


GGML is the framework for running deep neural network, mostly for interference. It's the same level as Pytorch or Tensorflow. So I would say GGML is the browser in your Javascript/React analogy.

llama.cpp is a project that uses GGML the framework under the hood, same authors. Some features were even developed in llama.cpp before being ported to GGML. Ollama provides a user-friendly way to uses llama models. No ideas what it uses under the hood.


The Llama name is pretty confusing at this point.

LLaMA was the model Facebook released under a non-commercial license back in February which was the first really capable openly available model. It drove a huge wave of research, and various projects were named after it (llama.cpp for example).

Llama 2 came out in July and allowed commercial usage.

But... there are increasing number of models now that aren't actually related to Llama at all. Projects like llama.cpp and Ollama can often be used to run those too.

So "Llama" no longer reliably means "related to Facebook's LLaMA architecture".


- GPTQ: pure gpu inference, used with AutoGPTQ, exllama, exllamav2, offers only 4 bit quantization

what is autoGTPTQ and exllama, what do it mean it only works with AutoGPTQ and exllama? Are those like TensorFlow Frameworks?


Ollama seems to be using a lot of the same, but as a really nice and easy to use wrapper for a lot of glue a lot of us would wind up writing anyway. It's quickly become my personal preference.

It looks to include submodules for GGML and GGUF from llama.cpp

https://github.com/jmorganca/ollama/tree/main/llm


The model discussed in the article is MiniLM-L6-v2, which you can run via PyTorch from the sentence-transformers project[1].

That model is based on BERT and not LLaMa [2].

[1]: https://www.sbert.net/docs/pretrained_models.html

[2]: https://huggingface.co/microsoft/MiniLM-L12-H384-uncased


I think you're still missing AWQ ones, which are a sort of GPTQ but with dynamic quantization depending on weight importance iirc?


there are modern products in this niche, and there is huge interest. the market certainly seems to be there (regium tried to defraud people for almost a million dollars i believe was the kickstarter sum before they got shut down).

there is squareoff [0], with new products currently in development (swap / neo)

then there was regium, an elaborate scam on kickstarter [1]

now there is phantom [2], which hopefully is not a scam. they at least posted some engineering details on hackaday [3]

squareoff has chess.com support, hopefully with lichess support coming (they are promising it, but has not yet happend). phantom claims working lichess support and to work on chess.com support

[0] https://squareoffnow.com

[1] https://www.chess.com/news/view/update-on-regium-chess

[2] https://www.kickstarter.com/projects/wondersubstance/phantom...

[3] https://hackaday.io/project/179268


Phantom is an elegant little system. They want to sell it for $670, which is a bit high, but will probably sell.


If it works I'll get two. Miss playing chess with my old roommate from college. Online works, but it would be too cool to have it sitting still on my desk only to get distracted from work by a piece moving.

The app seems a bit optimistic though.


What's your concern with the app? Looks pretty standard as far as chess goes, other than causing pieces to move on board.


check out KoReader [0] from the F-Droid Store, imho it's way better than the built-in onyx reader app.

[0] https://koreader.rocks


I have used these images for more than a year now, runs perfectly fine. Use Aurora Store (from F-Droid) if you need any Google Play apps installed.

Be careful if you rely on SafetyNet, seems to be a pain to get working correctly.

If you use the GCM registration, but push notifications just dont work, try this [0], worked for me

[0] https://github.com/microg/GmsCore/issues/226#issuecomment-26...


Thanks for sharing your experience :) I didn't know about Aurora Store before, despite using F-Droid for a while now.


Any pointers to get SafetyNet to pass?


Sorry, can't help there as I don't need it, therefore never tried to get it to work. I think the microg subreddit [0] is your best bet for pointers

[0] https://old.reddit.com/r/MicroG/


If you are paranoid about something like this happening, just use https://www.qubes-os.org/. all usb devices are jailed in a non-networked vm by default.

In general, if what you do warrants that level of paranoia, qubes will help you massively.

Micah Lee held a great overview talk at HOPE 2018: https://www.youtube.com/watch?v=f4U8YbXKwog


I don't think it solves same problem.


it does not solve the same problem, correct. it's still a great tool if your threat model warrants it.


Can you give an example of a threat model that would warrant it?


You’re a journalist. Source gives you a usb drive full of documents. Source is in reality hostile/compromised, so is the usb drive.


How does that work with input devices like keyboard and mouse?


generally it is advised to use ps2 input (like most laptop's integrated keyboard and touchpad).

details on using usb keyboard and mouse here: https://www.qubes-os.org/doc/usb-qubes/


since the qr code is just the totp seed, i simply print the seed in huge font on a sheet of paper. chance of enough degredation to inlegibility is pretty slim if stored correctly


at least for windows you can change the keyboard layout to directly output the deadkeys, example for german: https://github.com/RAnders00/Deutsch-ohne-Tottasten

if none is available for your language you can build it yourself: https://zauner.nllk.net/post/0014-windows-no-dead-keys/


OpenRA states in the about section

> This means that OpenRA is not restricted by the technical limitations of the original closed-source games: it includes native support for modern operating systems and screen resolutions...

I think this is the most compelling reason - it opens up features that would be way harder to realize in the original engine.

noteworthy

- OpenXcom https://openxcom.org/about/

- OpenRA https://www.openra.net/about/

- OpenTTD https://www.openttd.org/about.html


> not restricted by technical limitations

Yup, if you play Sacred Gold or Disciples II using open frameworks you get a game that looks much newer. But more importantly, perpetually playable.

Disciples II has an edit that uses OpenGL that makes the game much faster and more playable, and gives widescreen.

Sacred has numerous mods, such as giving 1080p HD (and more).

The next question is going to be: "But why do you play those games?" To this I will ask, honestly, what is Fortnight anyway?


It is a great question! Why does anybody play a specific game? For my kids gaming is a way to form a collective in-group narrative between a set of kids. That means that they all drift to the same games. My sons actually play different games together than the games they play to form a narrative with the bigger group. For me I play solo without talking about it with other adults too much. Many of the games are of the type I played with my friends when I was in the phase my kids are in now. Mostly 4x and RTS/TD. So my relaxation also has a nostalgic element. I don't grok Fortnite, but I do grok the cultural element in a few hundred million (!) people watching the same seasonal change. I wish I could have had gaming as a cultural element on that scale when I was young. Based on my preferences it beats the fads of the 90s by a very long stretch!


Do you by any chance have a link to the "open framework Sacred Gold" you mentioned?

It's been a while, but I sure would go look for my old disks to give it a try in FHD!


Hey heyens, here are my tips for you.

1) I used this [1] guide to get it working on Windows 10.

2) This [2] is a more detailed version with a lot of comments.

The bugs don't bother me that much to be honest, unless you want to complete all quests in a city, then you should have a look at them.

The tools one uses for configuration like dgVoodoo generally work, but I guess anyone could slip in malicious code if they really wanted to.

One final tip, the Hero Editor is a great way to edit your characters if you want to turn a fire mage into a wind one or stuff like that. Since multiplayer is not common these days, I reckon it's not cheating as you've lost all the benefits of multiplayer anyway. (I think the tool is just in German, not sure.) [3]

In terms of the open frameworks, it's not my area of specialty, but there is Sacred ReBorn, and there is even a Diablo 2 mod to Sacred. I believe even dgVoodoo does edit some files to use newer (open) frameworks.

[1] https://steamcommunity.com/sharedfiles/filedetails/?id=84319...

[2] https://steamcommunity.com/sharedfiles/filedetails/?id=20187...

[3] http://www.sacredvault.org/forum/index.php?action=tpmod;dl=i...




thank you, i'll need to give this a go



if you look at the graph, it all begins after 1971. that was the year the gold standard was finally abandoned for real.

i am not an expert and don't know all the important puzzle pieces, or even understand them. but looking at our financial systems, i think a big part of the reasons for the problems of our current form of capitalism is the unlinking of fiat currency from any form of backing.


I'm just trying to get my head around this too. I think the basic mechanics of it are:

1) Abolish gold standard, enabling government to print money

2) Government prints money which is captured by large corporations

3) 90% of the population still owns the same amount of money but it's now worth far less because the rest of the pie is having cash pumped into it.

We're seeing a current example of 2/3 with the Coronavirus government stimulus, which went directly into plutocrat pockets while the masses were handwaved away with a token gesture.


>> We're seeing a current example of 2/3 with the Coronavirus government stimulus, which went directly into plutocrat pockets while the masses were handwaved away with a token gesture.

This helps to maintain an illusion of scarcity. It keeps the productive working class desperate for fiat money. The realist is that there is no scarcity of fiat at the top echelons; this is made clear by the high valuation of cryptocurrency projects.

The top 1% has so much free money coming in that they can toss it away some cryptocurrency projects with no effect at all on their lifestyles... And in doing so they also hedge their personal risk against the reality of an unsustainable fiat system of which they are currently beneficiaries.


Yeah in principle the money printing could have been used to devalue the assets of the wealthy and redistribute wealth, decreasing inequality. That didn't happen, presumably because corporations captured the wealth in the way that you describe.

Further, it is clear that some of those corporations (e.g. FAANG) are not the "original wealthy", which is often used as part of an argument suggesting that there is a lot of wealth mobility, but it seems incredibly limited to me, and the flow is hardening just as it did before.


that's pretty much my train of thought as well.

trickle down economics don't work like the theories would want it to, just look at quantitative easing and the ECB equivalent. all those cheap loans to banks never ended up in the real economy


This doesn't really stand up to the facts. Governments went off the gold standard far before 1971. By this logic the New Deal and WW2 government spending should've caused huge inequality through inflation.

It seems far more about the success of neoliberal policy allowing economic & political power to concentrate. The various ways that concentrated power then kept capturing more and more of the pie can't be reduced to a single simple narrative.

You have for example:

* Tax cuts

* Unregulated Monopolies

* IP law

* Owned Media

* Unlimited Campaign donations

And on and on.


I don't think what you're saying here disagrees with my theory. You're describing the mechanics of the second half of part 2 ('which is captured by large corporations').

As government policy is restructured to concentrate more and more wealth at the top, the rest of the economy slowly becomes illiquid and the government has to keep printing money to keep the axles greased. It's unsustainable and we're seeing the endgame now.


Your original post put the blame for inequality on government intervention via money printing.

Govt Inflation -> Inequality.

If only we still had the gold standard then they couldn't cause inequality!

This theory however is contradicted by a bunch of historical data.

There was massive inequality on the gold standard pre-WW1. The New Deal was a huge govt intervention which reduced inequality.

My explanation is that the problem is not government interference in 'free' markets but inequality in power. Economic power via monopolies & weak labour bargaining position AND political power via lobbying, strong parties, gerrymandering.

This power inequality is then leveraged by the powerful to create wealth inequality. The exact mechanisms by which they corrupt the systems to capture that wealth be it inflation or deflation or government handouts or M&A regulations or even slavery doesn't matter. If the systems are controlled by this massive inequality in power then it will find a way to corrupt the rules in powers favour.

So arguing for any particular economic policy is less important than reforming the voting systems, the tax system, the lobbying system and ownership of the media.


What do you back it with though? If the global economy was still linked to gold the the price of gold would be astronomical and cause similar problems to oil. Any country that happened to sit on a gold reserve is suddenly a potential world power.

Although if I remember the price of gold was actually regulated while it was linked to the dollar and after the gold standard was removed the price per ounce went up fast. That sort of control feels like a fiat currency anyway.

Changing the mechanics of the economy will not fix inequality, it is born from corruption in the ruling class. They more they are held to account the more equal society will be.

The real problem now is more and more power lies with entities that are detached from government, and the peoples vote (such as it is) has even less effect on those global entities.


if anyone is looking for self hosting, https://thelounge.chat/ is a pretty good client (have used it for a few years)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: