More

ukuina · 2026-03-04T15:55:41 1772639741

Neat! What does the stack look like?

ukuina · 2026-03-04T06:54:22 1772607262

I find StrongDM's Dark Factory principles more immediately actionable (sorry, Simon!): https://factory.strongdm.ai/principles

eviluncle · 2026-03-04T10:24:20 1772619860

Not sure there's anything to be sorry for, he literally wrote about it a few weeks ago:

https://simonwillison.net/2026/Feb/7/software-factory/

9wzYQbTYsAIc · 2026-03-04T07:10:48 1772608248

I second that, sometimes it's defensibly worth throwing token fuel at the problem and validate as you go.

ukuina · 2026-03-02T16:57:22 1772470642

Can you allow placing the VM on an external disk?

Also, please allow Cowork to work on directories outside the homedir!

lxgr · 2026-03-02T17:47:59 1772473679

I suppose you could just symlink the directory it's in?

ukuina · 2026-03-01T17:19:35 1772385575

> code is for humans to read

Is this still true?

ukuina · 2026-02-28T16:29:40 1772296180

Neat! I've previously used something similar: https://www.emergentmind.com/

jbdamask · 2026-02-28T16:58:22 1772297902

Cool. I hadn't seen Emergent Mind

jazzpush2 · 2026-02-28T20:53:36 1772312016

It's very bad in my experience. It hallucinates like crazy - e.g. something simple as enumerating the correct hidden dimension for a transformer-based model (same across all layers) it gets wrong often.

ukuina · 2026-02-28T16:03:35 1772294615

Abstract:

> We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language models with autonomy, tool use, and multi-party communication, we document eleven representative case studies. Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports. We also report on some of the failed attempts. Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic deployment settings. These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms, and warrant urgent attention from legal scholars, policymakers, and researchers across disciplines. This report serves as an initial empirical contribution to that broader conversation.

ukuina · 2026-02-26T16:47:44 1772124464

Seems adjacent, with some overlap.

ukuina · 2026-02-24T01:45:04 1771897504

API usage is not banned.

ukuina · 2026-02-20T20:07:04 1771618024

A vuln scanner is dual-use.

ukuina · 2026-02-20T06:31:25 1771569085

Not so sure about that. There are many distinct LLM "smells" in that comment, like "A is true, but it hides something: unrelated to A" and "It's not (just) C, it's hyperbole D".

Kiro · 2026-02-20T08:51:47 1771577507

I personally love that phrasing even if it's a clear tell. Comparisons work well for me to grasp an idea. I also love bullet points.

So yeah, I guess I like LLM writing.

ehnto · 2026-02-20T08:35:24 1771576524

Sure, but you can read articles that predate LLMs which have the same so called tells.

lelanthran · 2026-02-20T09:03:01 1771578181

> Sure, but you can read articles that predate LLMs which have the same so called tells.

Not with such a high frequency, though. We're looking at 1 tell per sentence!