"We picked the latter, which also gave us our performance metric - percentage of... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		utdiscant on Dec 22, 2024 \| parent \| context \| favorite \| on: How we made our AI code review bot stop leaving ni... "We picked the latter, which also gave us our performance metric - percentage of generated comments that the author actually addresses." This metric would go up if you leave almost no comments. Would it not be better to find a metric that rewards you for generating many comments which are addressed, not just having a high relevance? You even mention this challenge yourselves: "Sadly, even with all kinds of prompting tricks, we simply could not get the LLM to produce fewer nits without also producing fewer critical comments." If that was happening, that doesn't sound like it would be reflected in your performance metric.

dakshgupta on Dec 22, 2024 | [–]

Good criticism that we should pay closer attention to. Someone else pointed this out and too and since then we’ve started tracking addressed comment per file changed as well.

SomewhatLikely on Dec 22, 2024 | [–]

You could probably modify the metric to addressed comments per 1000 lines of code.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact