Hacker Newsnew | past | comments | ask | show | jobs | submit | divingdragon's commentslogin

Really, as an East Asian language user the rest of the comments here make me want to scream.

I am not sure if you mean me, as I just asked a question. I wonder what the best way is to handle this disparity for international software. It seems like either you punish the Latin alphabets, or the others.

> I wonder what the best way is to handle this disparity for international software. It seems like either you punish the Latin alphabets, or the others.

there are over a million codepoints in unicode, thousands for latin and other language agnostic symbols emojis etc. utf-8 is designed to be backwards compatible with ascii, not to efficiently encode all of unicode. utf-16 is the reasonably efficient compromise for native unicode applications hence it being the internal format of strings in C# and sql server and such.

the folks bleating about utf-8 being the best choice make the same mistake as the "utf-8 everywhere manifesto" guys: stats skewed by a web/american-centric bias - sure utf-8 is more efficient when your text is 99% markup and generally devoid of non-latin scripts, that's not my database and probably not most peoples


  > sure utf-8 is more efficient when your text is 99% markup and generally devoid of non-latin scripts, that's not my database and probably not most peoples
I think this website audience begs to differ. But if you develop for S.Asia, I can see the pendulum swings to utf-16. But even then you have to account for this:

  «UTF-16 is often claimed to be more space-efficient than UTF-8 for East Asian languages, since it uses two bytes for characters that take 3 bytes in UTF-8. Since real text contains many spaces, numbers, punctuation, markup (for e.g. web pages), and control characters, which take only one byte in UTF-8, this is only true for artificially constructed dense blocks of text. A more serious claim can be made for Devanagari and Bengali, which use multi-letter words and all the letters take 3 bytes in UTF-8 and only 2 in UTF-16.»¹
In the same vein, with reference to³:

  «The code points U+0800–U+FFFF take 3 bytes in UTF-8 but only 2 in UTF-16. This led to the idea that text in Chinese and other languages would take more space in UTF-8. However, text is only larger if there are more of these code points than 1-byte ASCII code points, and this rarely happens in real-world documents due to spaces, newlines, digits, punctuation, English words, and markup.»²

The .net ecosystem isn't happy with utf-16 being the default, but it is there in .net and Windows for historical reasons.

  «Microsoft has stated that "UTF-16 [..] is a unique burden that Windows places on code that targets multiple platforms"»¹

___

1. https://en.wikipedia.org/wiki/UTF-16#Efficiency

2. https://en.wikipedia.org/wiki/UTF-8#Comparison_to_UTF-16

3. https://kitugenz.com/


the talk page behind the utf-16 wiki is actually quite interesting. it seems the manifesto guys tried to push their agenda there, and the allusions to "real text" with missing citations are a remnant of that. obv there's no such thing as "real text" and the statements about it containing many spaces and punctuation are nonsense (many languages do not delimit words with spaces, plenty of text is not mostly markup, and so on..)

despite the frothing hoard of web developers desperate to consider utf-16 harmful, it's still a fact that the consortium optimized unicode for 16-bits (https://www.unicode.org/notes/tn12) and their initial guidance to use utf-8 for compatibility and portability (like on the web) and utf-16 for efficiency and processing (like in a database, or in memory) is still sound.


Interesting link! It shows its age though (22 years), as it makes the point that utf-16 is already the "most dominant processing format", but if that would be the deciding factor, then utf-8 would be today's recommendation, as utf-8 is the default for online data exchange and storage nowadays, all my software assumes utf-8 as the default as well. But I can't speak for people living and trading in places like S.Asia, like you.

If one develops for clients requiring a varying set of textual scripts, one could sidestep an ideological discussion and just make an educated guess about the ratio of utf-8 vs utf-16 penalties. That should not be complicated; sometimes utf-8 would require one more byte than utf-16 would, sometimes it's the other way around.


hn often makes me want to scream

UK ring circuits are typically protected by 30A or 32A circuit breakers.

I use the rule

    ||youtube.com/shorts/*$uritransform=/shorts\/(.*)/watch\/\$1/`
in uBlock Origin.

(Except that it doesn't work if you click on a short from YouTube's interface - it loads with JavaScript which bypasses the redirection.)


try tapermonkey script

``` // ==UserScript== // @name YouTube Shorts → Normal Player // @match ://www.youtube.com/ // @run-at document-start // ==/UserScript==

function redirectShorts() { const match = location.pathname.match(/^\/shorts\/([a-zA-Z0-9_-]+)/); if (match) { location.replace(`/watch?v=${match[1]}`); } }

// Catch initial page load redirectShorts();

// Catch SPA navigation early (fires before page renders) document.addEventListener('yt-navigate-start', redirectShorts);

// Fallback document.addEventListener('yt-navigate-finish', redirectShorts); ```


> They did this to micro sd cards on the first switch.

What do you mean? From what I know it was bog-standard microSD(HC/XC) with the maximum supported speed being UHS-I with nothing proprietary.


It's more nuanced than that. The nintendo software prevents things, like save games, being saved at places which it deems unworthy.


But that's not something with only approved microSD cards work with save games. The Switch will not write save game data to any microSD card, regardless of features/manufacturer/branding/royalty payments/whatever. It's just not a supported feature, at all.

The GGP comment makes it sound like Nintendo only supported proprietary microSD cards at launch. While they did sell and recommend their branded microSD cards, one could use any brand of microSD card with the system and have the same functionality.


> Crashes would be avoided by having every train know about the train ahead and behind, and unable to make any move which would cause a collision (ie. it is not allowed to slam the brakes on if there is a train right behind you).

You are assuming that a train will never have to suddenly stop. This will never fly in the real world. Even if you consider a completely closed railway system with no possibility of external obstructions, there are many kinds of failure that would cause modern trains to apply emergency breaks due to fail-safe designs.

If you remove the bit about not allowing to slam on the breaks, then you just described SelTrac. Even the first version used on the Vancouver SkyTrain (opened in the 80s) is capable of running trains closer than braking distance from what I remember reading. I don't believe it is actually enabled on many SelTrac systems though, because you still need to have safety margins. There is always the possibility that the train in front may decelerate at a rate higher than its emergency braking rate, like if it derailed or collided with external obstructions.


There is an official way to completely disable and remove Computrace (which I did on my T480), but I don't remember anything that allowed removing ME.


Qt 5.15 at this point has been out for 4 years, already out of "normal" commercial LTS and will reach the end of extended commercial LTS next year. They don't have any incentive to do this kind of change.


> Even CMD.EXE batch files support LF.

I don't know if it is the case on Windows 11, but I have surely been bitten by CMD batch files using LF line endings. I don't remember the exact issue but it may have been the one bug affecting labels. [1]

[1]: https://www.dostips.com/forum/viewtopic.php?t=8988#p58888


I do this all the time.

Every once in a while, navigating with the keyboard in Explorer will cause the window thread to hang up in a busy loop and I have to kill it. I have no idea if this is an Explorer bug or caused by other stuff.


On Linux you can use `lsusb` to get a cleaner list of all connected USB devices.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: