Nice job. I have a similar automated CRON that runs overnight and does the following when it encounters new pictures in a folder:
- Qwen3-VL 8b creates a verbose description + keywords
- Simpler CLIP encoder builds another set of tags
- Description is placed into an image RAG
- Image has keywords placed using "underscores" into the file name itself
- Description/Tags/Keywords are all embedded in EXIF data on the image
I've got close to around 30k worth of images so doing this gives me a more manifold means of searching using natural language, keywords, etc. to quickly retrieve images.
Nice job, but that polka‑esque music makes me want to feed myself to the yeti as quickly as possible. :)
You might consider a mechanic where moving your character in a sinusoidal pattern works like building up blue sparks in Mario Kart rewarding the player with a temporary speed boost.
Another SkiFree homage that was posted a while back on HN that has a nice pixel art aesthetic:
Not OP but one example is that recent VL models are more than sufficient for analyzing your local photo albums/images for creating metadata / descriptions / captions to help better organize your library.
The easiest way to get started is probably to use something like Ollama and use the `qwen3-vl:8b` 4‑bit quantized model [1].
It's a good balance between accuracy and memory, though in my experience, it's slower than older model architectures such as Llava. Just be aware Qwen-VL tends to be a bit verbose [2], and you can’t really control that reliably with token limits - it'll just cut off abruptly. You can ask it to be more concise but it can be hit or miss.
What I often end up doing and I admit it's a bit ridiculous is letting Qwen-VL generate its full detailed output, and then passing that to a different LLM to summarize.
Strongly agree. Gemma3:27b and Qwen3-vl:30b-a3b are among my favorite local LLMs and handle the vast majority of translation, classification, and categorization work that I throw at them.
I'm using the default llama-server that is part of Gerganov's LLM inference system running on a headless machine with an nVidia 16GB GPU, but Ollama's a bit easier to ease into since they have a preset model library.
There actually was an attempt on HN a little while back to use GenAI to convert facts, flashcards, lists, etc. into automated melodic mnemonics. The biggest issue in that particular case was that it was also generating the motif from scratch.
At least for me, part of the reason I can still sing the countries of the world is because the original Animaniacs song was set to a tune that was already familiar: “Jarabe Tapatío” (aka the Mexican Hat Dance).
I memorized that (and several other) Animaniacs songs without being familiar with the melody. Even Tom Lehrer’s The Elements reached me before Pirates of Penzance did. I think the melody just needs to be simple, then it'll become ”familiar” quickly.
However, for the use-case at hand (remembering IPv6 addresses) I don't think I'd use that. I'd just write them down somewhere, like, uh, perhaps, oh I know: the hosts file.
“Catchiness” is probably more important than anything, hence the concept of the earworm aka stuck song syndrome. Even SOTA GenAI like Suno/Udio fall pretty short of generating genuinely engaging melodies.
You really need to let people use the Tab key instead of having to insert spaces manually. Even better would be to automatically start new lines at the correct indentation level, since Tab is often intercepted by browsers.
The current layout introduces ugly horizontal scroll bars when the viewport is even modestly resized, especially because the code snippets already use a fairly large font. As a result, you can’t see all the text at once. Since the program doesn’t auto-scroll to keep the cursor in view, it becomes very difficult to use unless you run it full screen.
Hey thanks for the feedback, the text should auto scroll, I have setup a minimum of 4 lines at the bottom, but this is maybe broken because of the overflow you talk about. I'm going to update that :)
Also, the two site you show got a shitload of cookie and login requirement. I plan to build something with no data being resold or login. Don't know if its a good idea ^^
That percussive sound that starts the song at the beginning of Play is SUPER jarringly loud. I'd trim it out of the track entirely.
I know you've mentioned tileset improvements but just to put it out there you've generated isometric buildings, but the tile set you're using appears to be square based. This creates a very incongruous style where the buildings don't feel like they're actually attached to the ground.
Finally, about the pixel art you’re using: you’re asking Nano Banana to generate it, but there are a couple of issues with prompting pixel art from GenAI models. The most obvious one is that the pixels aren’t aligned to a traditional grid which leads to really noticeable fringing. This is especially obvious when I use the scroll wheel to zoom in on some of the art assets.
I’d highly recommend using something like Unfake [1] to clean this up, aligning pixels and reducing the palette to something more consistent. It’s a bit more manual work, but it will make your assets look dramatically better.
Thanks for all of this, really appreciate the detailed feedback and the links, will check those out.
Apologies about the drum hit at the start! Appreciate that is probably way too loud, especially on headphones. Cutting the first second out now and also adding a fade in for music track on the menu music to ease you in!
The tileset/isometric mismatch and the pixel grid fringing are both great calls. I am by no means an artist so this is a big help! I hadn't come across Unfake before, that looks super useful to clean up the existing assets and any new ones I generate.
I came across https://www.pixellab.ai/ today when I was researching unit sprite animation & tileset generation and think I may have to redo a lot of the graphics entirely, seems to be the most expensive part outside of the Claude Max plan..
Running through your suggestions with Claude now to get them implemented. Cheers!
Very cool. Small bit of feedback - I'd suggest using the pointer events so that the site is compatible with desktop and tablets. It didn't seem to respond to touch events when I tried it.
Thanks for the feedback. Are you using mobile Safari? Can you try changing the pressure sensitivity slider to 0 percent? It might be an issue with wrong pressure detection. I've had to write some custom code for the native iOS version to fix it but hadn't really tested in the browser.
Sure. I dialed the pressure sensitivity down to zero, but it still didn’t work. This was in mobile Safari (WebKit) on an iPad. It definitely works in Firefox on Android, though, so it does seem to be an iOS-specific issue.
- Qwen3-VL 8b creates a verbose description + keywords
- Simpler CLIP encoder builds another set of tags
- Description is placed into an image RAG
- Image has keywords placed using "underscores" into the file name itself
- Description/Tags/Keywords are all embedded in EXIF data on the image
I've got close to around 30k worth of images so doing this gives me a more manifold means of searching using natural language, keywords, etc. to quickly retrieve images.
reply