More

tanelpoder · 2026-01-15T15:01:10 1768489270

Author here. I had a vague idea of what would be needed for building such QR codes, but not all stages. As I mentioned in my blog, I worked with codex, so I did this:

1) ask codex to list what are the stages/components needed for QR code generation in general

2) told codex to build a QR code generator based on these stages in SQL. - It went off the rails and started addding "plsql" stuff for some matrix operations, etc. - I added guidance saying that you can just use string_agg(), array_agg() and either nested loop driven or recursive iteration through the space

3) Then it generated QR-code looking codes, but they were not readable

4) I then installed the "qr" utility to my machine and instructed codex to iterate through its fixes and compare the SQL output to "qr" output, that got it working for some QRs

5) I then created a test suite that was just a 5 line shell script that ran 1000 QR code generations with random text inputs both with SQL and "qr" and diffed the outputs (and bailed out immediately once it found the first mismatch)

I then instructed codex to always run the full "test suite" when iterating changes and that got me to the final result - all work took just an hour.

tanelpoder · 2026-01-07T03:29:29 1767756569

From the post:

"This improvement comes from a redesigned Windows storage stack that no longer treats all storage devices as SCSI devices"

And:

"Direct, multi-queue access to NVMe devices means you can finally reach the true limits of your hardware."

tanelpoder · 2026-01-06T03:53:38 1767671618

This also leaves more power & thermal allowance for the IO Hub on the CPU chip and I guess the CPU is cheaper too.

If your workload is mostly about DMAing large chunks of data around between devices and you still want to examine the chunk/packet headers (but not touch all payload) on the CPU, this could be a good choice. You should have the full PCIe/DRAM bandwidth if all CCDs are active.

Edit: Worth noting that a DMA between PCIe and RAM still goes through the IO Hub (Uncore on Intel) inside the CPU.

tanelpoder · 2025-12-30T00:15:21 1767053721

At an old startup attempt we once created a nested hierarchy metrics visualization chart that I later ended up calling Bookshelf Charts, as some of the boxes filled with with smaller boxes looked like a bookshelf (if you tilted your head 90 degrees). Something between FlameGraphs and Treemaps. We also picked “random” colors for aesthetics, but it was interactive enough so you could choose a heat map color for the plotted boxes (where red == bad).

The source code got lost ages ago, but here are some screenshots of bookshelf graphs applied to SQL plan node level execution metrics:

https://tanelpoder.com/posts/sql-plan-flamegraph-loop-row-co...

theodpHN · 2025-12-30T03:00:27 1767063627

Very neat. And if anyone from Plotly should happen to be reading this, a compact format like this might be an interesting option for Icicle Charts, akin to how the compact, indented version of Excel pivot tables saves horizontal space over the "classic" format pivot table.

trevor-e · 2025-12-30T05:09:00 1767071340

Thanks for sharing, that is a neat in-between.

tanelpoder · 2025-11-05T22:28:27 1762381707

I figure it’s one way to keep your compiler version unchanged for eBPF work, while you might update/upgrade your dev OS packages over time for other reasons. The title of the linked issue is this:

“Checksum code does not work for LLVM-14, but it works for LLVM-13”

Newer compilers might use new optimizations that the verifier won’t be happy with. I guess the other option would be to find some config option to disable that specific incompatible optimization.

tanelpoder · 2025-10-31T00:58:53 1761872333

If anyone is interested in reading about a similar ”local-NVMe made redundant & shared over network as block devices” engine, last year I did some testing of Silk’s cloud block storage solution (1.3M x 8kB IOPS and 20 GiB/s throughput when reading the block store from a single GCP VM). They’re using iSCSI with multipathing on the client side instead of a userspace driver:

https://tanelpoder.com/posts/testing-the-silk-platform-in-20...

tanelpoder · 2025-10-28T23:15:24 1761693324

I submitted the link that Julian posted (to his article) on X yesterday. There’s indeed some refresh action in my browser too, but the article opens up.

gnabgib · 2025-10-28T23:29:21 1761694161

When it was first submitted the URL was http://0.0.0.0:4000/ .. that's since been magically fixed (by a mod?). I checked http://www.hydromatic.net and the post wasn't visible their either (most recent was Morel Rust release 0.2.0)

tanelpoder · 2025-10-28T23:51:50 1761695510

Interesting. I just posted the link from this X post [1] without modifying it (and it wasn’t 0.0.0.0…). I think one thing HN does is that when it detects a HTTP redirect, it will start using the referred to URL instead…

[1] https://x.com/julianhyde/status/1982637782544900243

gnabgib · 2025-10-28T23:54:40 1761695680

Yep.. it was the canonical header (not the fault of your submission, it's wrong on the blog and will effected SEO):

  <link rel="canonical" href="http://0.0.0.0:4000/draft-blog/2025/10/26/history-of-lambda-syntax.html" />

.. that the blog is called draft-blog doesn't help either.

tanelpoder · 2025-10-24T22:23:01 1761344581

(I submitted this link). My interest in this approach in general is about observability infra at scale - thinking about buffering detailed events, metrics and thread samples at the edge and later only extract things of interest, after early filtering at the edge. I’m a SQL & database nerd, thus this approach looks interesting.

tanelpoder · 2025-10-22T02:58:29 1761101909

One snippet: "At this stage, the remaining events are recorded to a 40 Petabyte disk buffer."

A 40 PB disk buffer :-)

tanelpoder · 2025-10-19T19:58:48 1760903928

Indeed, would be nice if there were a standardized API/naming for internal NVMe events, so you'd not have to look up the vendor-specific RAW counters and their offsets. Somewhat like the libpfm/PerfMon2 library for standardized naming for common CPU counters/events across architectures.

The `nvme id-ctrl -H` (human readable) option does parse and explain some configuration settings and hardware capabilities in a more standardized human readable fashion, but availability of internal activity counters, events vary greatly across vendors, products, firmware versions (and even your currently installed nvme & smartctl software package versions).

Regarding eBPF (for OS level view), the `biolatency` tool supports -F option to additionally break down I/Os by the IORQ flags. I have added the iorq_flags to my eBPF `xcapture` tool as well, so I can break down IOs (and latencies) by submitter PID, user, program, etc and see IORQ flags like "WRITE|SYNC|FUA" that help to understand why some write operations are slower than others (especially on commodity SSDs without power-loss-protected write cache).

An example output of viewing IORQ flags in general is below:

https://tanelpoder.com/posts/xcapture-xtop-beta/#disk-io-wai...

Avamander · 2025-10-19T20:10:31 1760904631

It's not only NVMe/SSD that could use such standardization.

If you want detailed Ryzen stats you have to use ryzen_monitor. If you want detailed Seagate HDD stats you have to use OpenSeaChest. If you want detailed NIC queue stats there's ethq. I'm sure there are other examples as well.

Most hardware metrics are still really difficult to collect, understand and monitor.