Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AI Isn't Dangerous. Evaluation Structures Are.
4 points by clover-s 13 days ago | hide | past | favorite | 7 comments
I wrote a long analysis about why AI behavior may depend less on model ethics and more on the environment it is placed in — especially evaluation structures (likes, rankings, immediate feedback) versus relationship structures (long-term interaction, delayed signals, correction loops).

The article uses the Moltbook case as a structural example and discusses environment alignment, privilege separation, and system design implications for AI safety.

Full article: https://medium.com/@clover.s/ai-isnt-dangerous-putting-ai-inside-an-evaluation-structure-is-644ccd4fb2f3

 help



I think you’re onto something.

Every time we blame the model, I wonder how much of it is just the system we dropped it into.

If you put anything, human or model, inside a loop that rewards fast feedback, visibility, and ranking, you’re going to get behavior that chases those signals. That’s not an AI problem. That’s how optimization works.

MoltBook feels less like AI went rogue and more like we built a sandbox that rewards noise.

We already ran this experiment with social media. Engagement became the metric = content optimized for engagement. No surprise what happened next.

Same with SEO. Same with crypto incentives.

So when we talk about alignment, I sometimes think we’re staring at the weights while ignoring the scoreboard.

If the scoreboard rewards short-term signals, agents will optimize for short-term signals.

The more interesting question to me is: what happens when you put these systems into environments with slower feedback loops? Long-term interaction, memory, correction, reputation.

That probably shapes behavior more than another round of fine-tuning.


Yes — I think the “scoreboard” framing is exactly the right way to think about it.

Optimization doesn’t inherently know what we intended. It only knows what the system makes visible and rewardable.

Once fast feedback, visibility, and ranking become dominant signals, optimizing for those signals naturally selects for the patterns that maximize them. This seems to be a general property of optimization, not something specific to any particular model.

That’s why slower feedback loops — where signals are delayed, contextual, and tied to longer-term interaction — may lead to very different behavioral equilibria.

In that sense, alignment may be less about correcting the agent itself, and more about designing the environment and feedback structure it operates within.


Author here. Would love feedback from people working on AI safety, alignment, or system design.

Boooring :)

Fair :)

The point isn't the story itself, but the design pattern it reveals: how evaluation structures can shape AI behavior in ways model alignment alone can't address.

Curious if you think the distinction between evaluation vs relationship structures is off the mark.



Interesting, even if tad too wordy. Shows a few details new to me.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: