Hacker Newsnew | past | comments | ask | show | jobs | submit | meatmanek's commentslogin

By the same logic, wouldn't 4 months of severance pay be equivalent to forfeiting 92% of salary?

For something paid at regular intervals like RSUs, you really should never be looking at the total value of the grant, and instead think of it in terms of how many shares per paycheck/month/quarter/year you vest.

If you've got a cliff coming up, that's different. I'd be pissed if a company laid me off 11.5 months into a 12 month cliff or a few weeks before an annual bonus and didn't accelerate the vesting / bonus.


That's exactly my point. "Losing" your RSUs is the same as losing all your other income that you did not earn because you don't work there any more.

> Be careful working CTRL + W into muscle memory though, I've lost count of how many browser tabs I've closed by accident...

I still maintain this is why macOS is the best OS for terminal work -- all the common keybindings for GUI tools use a different modifier key, so e.g. ⌘C and ⌘W work the same in your terminal as they do in your browser.

(Lots of the readline/emacs-style editing keybindings work everywhere in macos as well -- ^A, ^E, ^K, ^Y, but not ^U for some reason)


100% agree, and I am surprised I do not see this mentioned more often. I came up on Linux and then had to use MacOS for a job and got used to the cmd / ctrl separation and now I cannot use a terminal on Linux without major pain. I've tried a few of the key rebinding options and they all feel klunky.

You could try a preprocessing step where you convert to hiragana, but I guess that would lose pitch accent information (e.g. 飴 vs 雨)

Exactly. Qwen only has one pitch accent for pure hiragana words, even though it actually work (removing mandarin mixed-in), which requires some great efforts to normalize text in order to disambiguate heteronyms, the result is (if you use voice cloning) your favorite CV speaking in some weird, unknown accent :)

That got me wondering if "you convert to hiragana" is a solved task, or a research team and five years[0], and Google showed me an article[1] that gave me a facepalm, quoting from Google Translate(square brackets are mine):

  > - As a result,
  >   - When the string "明日["tomorrow"]" is entered into TTS, the TTS model [・皿・] outputs an ambiguous pronunciation that sounds like a mix of "asu" and "ashita" (something like "[asyeta]").

  > From this, we found that by using the proposed method, it is possible to obtain data from private data in which the consistency between speech, graphemes, and phonemes is almost certainly maintained for more than 80% of the total.

  > Another possible cause is a mismatch between the domain of the training data's audio (all [in read-aloud tones]) and the inference domain.
My resultant rambling follows:

  1. Sounds like general state of Japanese speech dataset is a mess
    1.1. they don't maintain great useful correspondence between symbols to audio
    1.2. they tend to contain too much of "transatlantic" voices and less casual speeches
  2. Japanese speakers generally don't denote pronunciations for text
    2.1. therefore web crawls might not contain enough information as to how they're actually pronounced
    2.2. (potentially) there could be some texts that don't map to pronunciations
    2.3. (potentially) maybe Japanese spoken and literal languages are still a bit divergent from each others 
  3. The situation for Chinese/Sinitic languages are likely __nowhere__ near as absurd, and so Chinese STT/TTS might not be well equipped to deal with this mess
  4. This feels like much deeper mess than what commonly observed "a cloud in a sky" Japanese TTS problems such as obvious basic alignment errors(e.g. pronouncing "potatoes" as "tato chi")
---

  0: https://xkcd.com/1425/
  1: https://zenn.dev/parakeet_tech/articles/2591e71094ea58
  2: https://qiita.com/maishikawa/items/dcadfeebf693080f0415

The lights are relatively easy to get. iirc (it's been a bit since I watched their full video on the subject[1]) the hard part to find was the splitter that sends the sodium-vapor light to one camera and everything else to another camera.

1. https://www.youtube.com/watch?v=UQuIVsNzqDk


It would seem to me to be relatively easy to build something like that if you're okay shooting with effectively a full stop less light (just split the image with a half-silvered reflector and use a dichroic filter to pass the sodium-vapor light one one side.

The splitter would have to be behind the lens, so it would require a custom camera setup (probably a longer lens-to-sensor distance than most lenses are designed for too), but I can't think of any other issues.


At the end of this video they link to another video from a year ago [1] (this is the same link as the comment you were commenting on, whoops), where they recreate the sodium vapor process with a rig with a beam splitter, one side had a filter to reject sodium vapor light and the other has one to reject everything but sodium vapor light, and then a camera on each side.

The Disney process had the filter essentially built into the beam splitter, but afaik, nobody knows how to make that happen again (or nobody who knows how, knows it's a desirable thing). Seems like the optics might be cumbersome, but the results seem wortwhile.

Also, you need still need careful lighting, you don't want your foreground illuminated by sodium vapor, but I wonder if you could light the background screen from behind (like a rear projection setup) to reduce the amount of sodium vapor light that reflects from the foreground to the camera.

[1] https://m.youtube.com/watch?v=UQuIVsNzqDk


We know how to make dichroic prisms (Technicolor used them when filming, as did "3 CCD" digital cameras), but I imagine that to have a sufficiently narrow rejection band for the sodium-vapor prociess, you would need to be smart about where you place the prism, since the stop-band of a dichroic filter changes with angle of incidence.


Yup, I wanted to say that the prisms are hard to recreate, not the light itself.


> For example, Sinh[ArcCosh[2]] returns −√3 but √(2² − 1) = √3. The expression Mathematica returns for Sinh[ArcCosh[x]] correctly evaluates to −√3

but the expression given is sqrt((x-1)/(x+1))(x+1), which for x=2 would be sqrt(1/3)*3 = sqrt(3)

did you mean Sinh[ArcCosh[-2]]?


For ASR/STT on a budget, you want https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3 - it works great on CPU.

I haven't tried on a raspberry pi, but on Intel it uses a little less than 1s of CPU time per second of audio. Using https://github.com/NVIDIA-NeMo/NeMo/blob/main/examples/asr/a... for chunked streaming inference, it takes 6 cores to process audio ~5x faster than realtime. I expect with all cores on a Pi 4 or 5, you'd probably be able to at least keep up with realtime.

(Batch inference, where you give it the whole audio file up front, is slightly more efficient, since chunked streaming inference is basically running batch inference on overlapping windows of audio.)

EDIT: there are also the multitalker-parakeet-streaming-0.6b-v1 and nemotron-speech-streaming-en-0.6b models, which have similar resource requirements but are built for true streaming inference instead of chunked inference. In my tests, these are slightly less accurate. In particular, they seem to completely omit any sentence at the beginning or end of a stream that was partially cut off.


This seems to be estimating based on memory bandwidth / size of model, which is a really good estimate for dense models, but MoE models like GPT-OSS-20b don't involve the entire model for every token, so they can produce more tokens/second on the same hardware. GPT-OSS-20B has 3.6B active parameters, so it should perform similarly to a 3-4B dense model, while requiring enough VRAM to fit the whole 20B model.

(In terms of intelligence, they tend to score similarly to a dense model that's as big as the geometric mean of the full model size and the active parameters, i.e. for GPT-OSS-20B, it's roughly as smart as a sqrt(20b*3.6b) ≈ 8.5b dense model, but produces tokens 2x faster.)


Yeah, I looked up some models I have actually run locally on my Strix Halo laptop, and its saying I should have much lower performance than I actually have on models I've tested.

For MoE models, it should be using the active parameters in memory bandwidth computation, not the total parameters.


The docs page addresses this:

> A Mixture of Experts model splits its parameters into groups called "experts." On each token, only a few experts are active — for example, Mixtral 8x7B has 46.7B total parameters but only activates ~12.9B per token. This means you get the quality of a larger model with the speed of a smaller one. The tradeoff: the full model still needs to fit in memory, even though only part of it runs at inference time.

> A dense model activates all its parameters for every token — what you see is what you get. A MoE model has more total parameters but only uses a subset per token. Dense models are simpler and more predictable in terms of memory/speed. MoE models can punch above their weight in quality but need more VRAM than their active parameter count suggests.

https://www.canirun.ai/docs


It discusses it, and they have data showing that they know the number of active parameters on an MoE model, but they don't seem to use that in their calculation. It gives me answers far lower than my real-world usage on my setup; its calculation lines up fairly well for if I were trying to run a dense model of that size. Or, if I increase my memory bandwidth in the calculator by a factor of 10 or so which is the ratio between active and total parameters in the model, I get results that are much closer to real world usage.


While your remark is valid, there's two small inaccuracies here:

> GPT-OSS-20B has 3.6B active parameters, so it should perform similarly to a 3-4B dense model, while requiring enough VRAM to fit the whole 20B model.

First, the token generation speed is going to be comparable, but not the prefil speed (context processing is going to be much slower on a big MoE than on a small dense model).

Second, without speculative decoding, it is correct to say that a small dense model and a bigger MoE with the same amount of active parameters are going to be roughly as fast. But if you use a small dense model you will see token generation performance improvements with speculative decoding (up to x3 the speed), whereas you probably won't gain much from speculative decoding on a MoE model (because two consecutive tokens won't trigger the same “experts”, so you'd need to load more weight to the compute units, using more bandwidth).


So, this is all true, but this calculation isn't that nuanced. It's trying to get you into a ballpark range, and based on my usage on my real hardware (if I put in my specs, since it's not in their hardware list), the results are fairly close to my real experience if I compensate for the issue where it's calculating based on total params instead of active.

So by doing so, this calculator is telling you that you should be running entirely dense models, and sparse MoE models that maybe both faster and perform better are not recommended.


I agree, and I even started my response expressing my agreement with the whole point.

But since this is a tech forum, I assumed some people would be interested by the correction on the details that were wrong.


I'm guessing this is also calculating based on the full context size that the model supports but depending on your use case it will be misleading. Even on a small consumer card with Qwen 3 30B-A3B you probably don't need 128K context depending on what you're doing so a smaller context and some tensor overrides will help. llama.cpp's llama-fit-params is helpful in those cases.


macOS is the best desktop UNIX for one simple reason: the ⌘ key. The fact that 99% of your GUI keybindings use a key that your CLI tooling cannot use eliminates conflicts and means that you don't have to remember things like "Copy is ^C in Chrome but ^⇧C in the terminal".


using a linux with toshy to get the best of both worlds wrt keybindings. linux and kde is amazing nowadays... I don't miss macos but would be hating linux without mac style keybindings.


Yeah, I use Kinto (which seems to be what Toshy is originally based on). A recent Ubuntu update broke it though, and I accidentally deleted my config file while trying to fix it, so maybe now's a good time to try out Toshy. Looks like Toshy creates a python virtualenv instead of relying on system packages, which should make it a little more resilient to system package changes.


yeah... it's good stuff and if python breaks something after a package update reinstalling toshy is quick and easy and the config is safe if you only write your modifications inside the areas where they tell you to. (though backup of the folder is a good idea when doing that)

> I’ve got a first gen M1 Max and it destroys all but the largest cloud instances (that cost its entire current market value per month!)

You're either underestimating how big cloud instances can get or overestimating how much it costs to rent a cloud instance that would beat an M1 Max at any multi-core processing.

According to Geekbench, the M1 Max macbook pro has a single-core performance of 2374 and multicore of 12257; AWS's c8i.4xlarge (16 vCPUs) has 2034 and 12807, so relatively equivalent.

That c8i.4xlarge would cost you $246/mo at current spot pricing of $0.3425/hr, which is, what, 20% of the cost of that M1 Max MBP?

As discussed recently in https://news.ycombinator.com/item?id=47291906, Geekbench is underestimating the multi-core performance of very large machines for parallelizable tasks -- the benchmark's performance peaks at around 12x single-core performance. (I might've picked a different benchmark but I couldn't find another benchmark that had results for both the M1 Max and the Xeon Scalable 6 family.)

If your tasks are _not_ like that, then even a mid-range cloud instance like a 64-vCPU c8i.16xlarge (which currently costs $0.95/hour on the spot market) will handily beat the M1 Max, by a factor of about 4. The largest cloud instances from AWS have 896 vCPUs, so I'd expect they'd outperform the M1 Max by about 50-to-1 for trivially parallelizable workloads. Even if you stay away from the exotic instances like the `u7i-12tb.224xlarge` and stick to the standard c/m/r families, the c8i.96xlarge has 384 vCPUs (so at least 24x the compute power of that M1 Max) and costs $3.76/hr.


> That c8i.4xlarge would cost you $246/mo at current spot pricing of $0.3425/hr, which is, what, 20% of the cost of that M1 Max MBP?

A 5 month ROI on a hardware investment would be excellent, so not sure what you're trying to say here?


5 months is a lot worse than 1 month, which is what the parent claimed.


I find that if I ask an LLM to explain what its reasoning was, it comes up with some post-hoc justification that has nothing to do with what it was actually thinking. Most likely token predictor, etc etc.

As far as I understand, any reasoning tokens for previous answers are generally not kept in the context for follow-up questions, so the model can't even really introspect on its previous chain of thought.


I mostly find it useful for learning myself or for questioning a strange result. It usually works well for either of those. As you said, I'm probably not getting it's actual reasoning from any reasoning tokens but never thought that was happening anyway. It's just a way of interrogating the current situation in the current context.

It providing a different result is exactly because it's now looking at the existing solution and generating from there.


It depends on the harness and/or inference engine whether they keep the reasoning of past messages.

Not to get all philosophical but maybe justification is post-hoc even for humans.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: