Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I don't want my code scraped and remixed by AI systems.

Just curious - why not?

Is it mostly about the commercial AI violating the license of your repos? And if commercial scraping was banned, and only allowed to FOSS-producing AI, would you be OK with publishing again?

Or is there a fundamental problem with AI?

Personally, I use AI to produce FOSS that I probably wouldn't have produced (to that extent) without it. So for me, it's somewhat the opposite: I want to publish this work because it can be useful to others as a proof-of-concept for some intended use cases. It doesn't matter if an AI trains on it, because some big chunk was generated by AI anyway, but I think it will be useful to other people.

Then again, I publish knowing that I can't control whether some dev will (manually or automatically) remix my code commercially and without attribution. Could be wrong though.

 help



> Just curious - why not?

Because that code is not out there for its license to be violated and earned money from it. All the choices from license and how it's shared is deliberate. The code out there is written by a human, for human consumption with strict terms to be kept open. In other words, I'm in this for fun, and my effort is not for resale, even if resale of it pays me royalties, because it's not there for that.

Nobody asked for my explicit consent before scraping it. Nobody told me that it'll be stripped from its license and sold and make somebody rich. I found that some of my code ended in "The Stack", which is arguably permissively licensed code only, but some forks of GPL repositories are there (i.e.: My fork of GNOME LightDM which contains some specific improvements).

I'm writing code for a long time. I have written a novel compression algorithm (was not great but completely novel, and I have published it), a multi-agent autonomous trading system when multi-agent systems were unknown to most people (which is my M.Sc. thesis), and a high performance numerical material simulation code which saturates CPUs and their memory busses to their practical limits. That code also contains some novel algorithms, one of them is also published, and it's my Ph.D. thesis as a whole.

In short, I write everything from scratch and optimize them by hand. None of its code is open, because I wanted to polish them before opening them, but they won't be opened anymore, because I don't want my GPL licensed novel code to be scraped and abused.

> Or is there a fundamental problem with AI?

No. I work with AI systems. I support or help designing them. If the training data is ethically sourced, if the model is ethically designed, that's perfectly fine. Tech is cool. How it's developed for the consumer is not. I have supported and taken part in projects which make extremely cool things with models many people scoff at find ancient, yet these models try to warn about ecosystem/climate anomalies and keep tabs on how some ecosystems are doing. There are models which automate experiments in labs. These are cool applications which are developed ethically. There are no training data which is grabbed hastily from somewhere.

None of my code is written by AI. It's written by me, with sweat, blood and tears, by staring at a performance profiler or debugger trying to understand what the CPU is exactly doing with that code. It's written by calculating branching depths, manual branch biasing to help the branch predictor, analyzing caches to see whether I can possibly fit into a cache to accelerate that calculation even further.

If it's a small utility, it's designed for utmost user experience. Standard compliant flags, useful help outputs, working console detection and logging subsystems. My minimum standard is the best of breed software I experienced. I aspire to reach their level and surpass them, I want my software feel on par with them, work as snappy as the best software out there. It's not meant to be proof of concept. I strive a level of quality where I can depend on that software for the long run.

And what? I put that effort out there for free for people to use it, just because I felt sharing it with a copyleft license is the correct thing to do.

But that gentleman's agreement is broken. Licenses are just decorative text now. Everything is up for grabs. We were a large band of friends who looked at each other's code and learnt from each other, never breaking the unwritten rules because we were trying to make something amazing for ourselves, for everyone.

Now that agreement is no more. It's the powerful's game now. Who has the gold is making the golden rules, and I'm not playing that game anymore. I'll continue to sharpen my craft, strive to write better code every time, but nobody gonna get to see the code or use it anymore.

Because it was for me since the beginning, but I wanted everyone have access to it, and I wanted nothing except respecting the license it has to keep it open for everyone. Somebody played dirty, and I'm taking my ball and going home. That's it.

If somebody wants to see a glimpse of what I do and what I strive for, see https://git.sr.ht/~bayindirh/nudge. While I might update Nudge, There won't be new public repositories. Existing ones won't be taken down.


I appreciate that you wrote this, it's a take on this issue I've been thinking about from the perspective that I was looking for.

Thanks for replying.

That's fair. I completely agree that much of LLM training was (and still very much is) in violation of many licenses. At the very least, the fact that the source of training data is obfuscated even years after the training, shows that developers didn't care about attribution and licenses - if they didn't deliberately violate them outright.

Your conditions make sense. If I had anything I thought was too valuable or prone to be blatantly stolen, I would think thrice about whom I share it with.

Personally, ever since discovering FOSS, I realized that it'd be very difficult to enforce any license. The problem with public repositories is that it's trivial for those not following the gentleman's agreement to plagiarize the code. Other than recognizing blatant copy-pasting, I don't know how I'd prevent anyone from just trivially remixing my content.

Instead, I changed to seeing FOSS like scientific contributions:

- I contribute to the community. If someone remixes my code without attribution, it's unfair, but I believe that there are more good than bad contributors.

- I publish stuff that I know is personally original, i.e., I didn't remix without attribution. I can't know if some other publisher had the same idea in isolation, or remixed my stuff, but over time, provenance and plagiarism should become apparent over multiple contributions, mine and theirs.

- I don't make public anything that I can see my future self regretting. At the same time, I've always seen my economic value in continuous or custom work, not in products themselves. For me, what I produce is also a signal of future value.

- I think bad faith behavior is unsustainable. Sure, power delays the consequences, but I've seen people discuss injustice and stolen valor from centuries ago, let alone recent examples.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: