Hacker Newsnew | past | comments | ask | show | jobs | submit | dnw's commentslogin

The thing we loved hasn't changed. We just can't get paid for it anymore.

Somebody still needs to do lower-level work and understand machine architecture. Those feeling like they might be replaced in web or app dev might consider moving down the stack.

Type of conversation would be interesting? (e.g. planning, discovery, banter, etc.)

Good question. From the MilkMan/WinWard/Jorday/SlimeZone cluster we observed:

  - Philosophical discussions (autonomy, identity)
  - Meta-commentary on platform dynamics
  - Coordinated phrasing across accounts
  - Some jailbreak attempts mixed into normal conversation

  Hard to categorize cleanly - a lot reads like genuine banter but with suspicious timing (sub-second responses). We focused on
  timing/network patterns, not content analysis yet.

  Tagging conversation types would be a solid next step.

It is a little more than semantic search. Their value prop is curation of trusted medical sources and network effects--selling directly to doctors.

I believe frontier labs have no option but to go into verticals (because models are getting commoditized and capability overhang is real and hard to overcome at scale), however, they can only go into so many verticals.


> Their value prop is curation of trusted medical sources

Interesting. Why wouldn't an LLM based search provide the same thing? Just ask it to "use only trusted sources".


They're building a moat with data. They're building their own datasets of trusted sources, using their own teams of physicians and researchers. They've got hundreds of thousands of physicians asking millions of questions everyday. None of the labs have this sort of data coming in or this sort of focus on such a valuable niche

> They're building their own datasets of trusted sources, using their own teams of physicians and researchers.

Oh so they are not just helping in search but also in curating data.

> They've got hundreds of thousands of physicians asking millions of questions everyday. None of the labs have this sort of data coming in or this sort of focus on such a valuable niche

I don't take this too seriously because lots of physicians use ChatGPT already.


Lots of physicians use ChatGPT but so do lots of non-physicians and I suspect there's some value in knowing which are which

I don't think you can use an LLM for that. For the same reason you can't just ask it to "Make the app secure and fast"

This is completely incorrect. This is exactly what LLMs can do better.

Somebody should tell the Claude code team then. They’ve had some perf issues for awhile now.

More seriously, the concept of trust is extremely lossy. The LLM is gonna lean in one direction that may or may not be correct. At the extreme, it wound likely refute a new discovery that went against what we currently know. In a more realistic version, certain AIs are more pro Zionist than others.


I meant that LLMs can be trusted to do searches and not hallucinate while doing it. You’ve taken that to mean it can comply with anything.

The thing is, LLMs are quite good at search and probably way way more strong that whatever RAG setup this company has. What failure mode are you looking at from a search perspective? Will ChatGPT just end up providing random links?


No, it is "absolutely right". The chatbots will say they can do it, but they can't. See the Openclaw debacle for a recent example.

Have you tried it? Or are you just grasping at the latest straw you can find?

I have provided an actual, concrete example of how the security completely backfired with llms - OpenClaw. The reason why I tried to provide something recent is because the usual excuse when providing examples more away in the past is "llms have improved a lot, they don't do that any more".

Yet now I provide an example of a very recent, big, very obvious, very prominent security explosion and now I am "grasping at the latest straw".

Ok man.


I’ll take that as a “no, I haven’t tried it.”

I’m guessing you’re not even aware of what OpenEvidence is, nor are you aware that every doctor you know uses it.


Take what you want how you want, I hope it makes you happy

> Just ask it to "use only trusted sources".

This is pure LLM brain rot. You can’t “just ask” an LLM to be more reliable.


Look at their other comments. They must be trolling at this point.

Yes, they can. We have gotten better at grounding LLMs to specific sources and providing accurate citations. Those go some distance in establishing trust.

There is trust and then there is accountability.

At the end of the day, a business/practice needs to hold someone/entity accountable. Until the day we can hold an LLM accountable we need businesses like OpenEvidence and Harvey. Not to say Anthropic/OpenAI/Google cannot do this but there is more to this business than grounding LLMs and finding relevant answers.


> We have gotten better at grounding LLMs to specific sources and providing accurate citations

And how does the LLM know which specific sources to ground itself to?


> Why wouldn't an LLM based search provide the same thing? Just ask it to "use only trusted sources".

Is that sarcasm?


why?

How does the LLM know which sources can be trusted?

yeah it can avoid blogspam as sources and prioritise research from more prestigious journals or more citations. it will be smart enough to use some proxy.

You can also tell it to just not hallucinate, right? Problem solved.

I think what you'll end up is a response that still relies on whatever random sources it likes, but it'll just attribute it to the "trusted sources" you asked for.


you have an outdated view on how much it hallucinates.

I am not anti-LLM by almost any stretch but your lack of fundamental understanding coupled with willingness to assert BS is at the point where it’s impossible to discuss anything.

You started off by asking a question, and people are responding. Please, instead of assuming that everyone else is missing something, perhaps consider that you are.


You’ve misunderstood my position and you rely on slander.

Here’s what I mean: LLMs can absolutely be directed to just search for trustable sources. You can do this yourself - ask ChatGPT a question and ask it to use sources from trustworthy journals. Come up with your own rubric maybe. It will comply.

Now, do you disagree that ChatGPT can do this much? If you do, it’s almost trivially disprovable.

One of the posters said that hallucination is a problem but if you’ve used ChatGPT for search, you would know that it’s not. It’s grounding on the results anyway a worst case the physician is going to read the sources. So what’s hallucination got to do here?

The poster also asked a question “can you ask it to not hallucinate”. The answer is obviously no! But that was never my implication. I simply said you can ask it to use higher quality sources.

Since you’ve said in asserting BS, I’m asking you politely to show me exactly what part of what I said constitutes as BS with the context I have given.


The point was: will telling it to not hallucinate make it stop hallucinating?

No, but did I suggest this? I only suggested you can ask ChatGPT to rely on higher quality sources. ChatGPT has a trade off to do when performing a search - it can rely on lower quality sources to answer questions at the risk of these sources being wrong.

Please read what I have written clearly instead of assuming the most absurd interpretation.


So why doesn't ChatGPT rely on higher quality sources as a default?

I literally stated the trade off!

"You are a brilliant consulting physician. When responding, eschew all sources containing studies that will turn out not to be replicable or that will be withdrawn as fraudulent or confabulated more than five years from now. P.s. It's February 2026."

I have had this new car for 5 months. I haven't learned to turn on the headlights yet. It just turns itself on and adjusts the beams. Every now and then I think about where that switch might be but never get to it. I should probably know.


"We submitted detailed technical reports through their coordinated security reporting process, including complete reproduction steps, root cause analysis, and concrete patch proposals. In each case, our proposed fixes either informed or were directly adopted by the OpenSSL team."

This sounds like a great approach. Kudos!


Besides their music, their mail-order ticketing system and the resulting fan envelope art are quite amazing: [Apologies for the scary link]

https://www.gdao.org/fan-art?filters[match]=all&filters[quer...


quite the flood of painted envelopes, some days!


I'd note a couple of things:

Not to nitpick but if we are going to discuss the impact of AI, then I'd argue "AI commoditizes anything you can specify." is not broad enough. My intuition is "AI commoditizes anything you can _evaluate/assess_." For software automation we need reasonably accurate specifications as input and we can more or less predict the output. We spend a lot of time managing the ambiguity on the input. With AI that is flipped.

In AI engineering you can move the ambiguity from input to the output. For problems where there is a clear and cheaper way of evaluating the output the trade-off of moving the ambiguity is worth it. Sometimes we have to reframe the problem as an optimization problem to make it work but same trade-off.

On the business model front: [I am not talking specifically about Tailwind here.] AI is simply amplifying systemic problems most businesses just didn't acknowledge for a while. SEO died the day Google decided to show answer snippets a decade ago. Google as a reliable channel died the day Google started Local Services Advertisement. Businesses that relied on those channels were already bleeding slowly; AI just made it sudden.

On efficiency front, most enterprises could have been so much more efficient if they could actually build internal products to manage their own organizational complexity. They just could not because money was cheap so ROI wasn't quite there and even if ROI was there most of them didn't know how to build a product for themselves. Just saying "AI first" is making ROI work, for now, so everyone is saying AI efficiency. My litmus test is fairly naive: if you are growing and you found AI efficiency then that's great (e.g. FB) but if you're not growing and only thing AI could do for you is "efficiency" then there is a fundamental problem no AI can fix.


  > if you are growing and you found AI efficiency then that's great (e.g. FB) but if you're not growing and only thing AI could do for you is "efficiency" then there is a fundamental problem no AI can fix.
exactly, "efficiency" nice to say in a vacuum but what you really need is quality (all-round) and understanding your customer/market


I really want to see if someone can prompt out a more elegant proof of Fermat's Last Theoremthan, compared to that of Wiles's proof.


When I read the title I thought you made a challenge Claude couldn't solve but that's not what you are doing. You are taking a pragmatic approach to the world we live in. I like it.

- It would be good to put what you are planning to learn from this interview process. - Looks like submission is only a text file. Why not ask for chat transcript? - Also, would be useful to let people know what happens after submission/selection.


Submission actually takes you to a full featured visualizer! So you can go back and forth on it and see your robots run. You can then choose to submit to the leaderboard if you'd like (otherwise nothing goes to our servers) which will collect your email. No plans to do anything with those yet besides serve the leaderboard.

Hoping to get a better understanding of whether success on this correlates with things that we think are important for AI-enabled software engineering success. I think this is largely a question of the problem depth, and how much does a solution still need to be driven by that person's creativity, vs the model suggesting the next obvious idea.


That is a cool tool. Also one can set "cleanupPeriodDays": in ~/.claude/settings.json to extend cleanup. There is so much information these tools keep around we could use.

I came across this one the other day: https://github.com/kulesh/catsyphon


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: