Might the conclusions be correct even if some of the facts are not? Even a stopped clock is right twice a day. And, "approximately correct" is still sometimes valuable.
I think the court dropped the ball here. On the one hand, I think they were right that using existing works--copyrighted or otherwise--to train a model was transformable fair use. On the other hand, Anthropic and others trained their models on illicit copies of the works; they (more often than not) didn't pay the copyright holders.
There's a doctrine in Fifth Amendment law called "fruit of the poisonous tree." The general rule is that prosecutors don't get to present evidence in a criminal trial that they gained unlawfully. It's excluded. The jury never gets to see it even if it provides incontrovertible evidence of guilt. The point is to discourage law enforcement from violating the rights of the accused during the investigative process, and to obtain a warrant as the Amendment requires.
It seems to me that the same logic ought to be applied to these companies. They want to make money by building the best models they can. That's fine! They should be able to use all the source data they can legitimately obtain to feed their training process. But if they refuse to do so and resort to piracy, they mustn't be allowed to claim that they then used it fairly in the transformative process.
The '90s was a bit too soon for that. Most people using the Internet then were still on dialup, to the extent they were connected at all. There weren't that many DDoSes yet. Even the Trin00 DDoS in 1999 only involved 114 machines.
The unit of granularity for a CoW filesystem is a block, which is typically 4kB or smaller. The unit of granularity for S3 is the entire object or 5MB (minimum multipart upload size), whichever is smaller. The difference can be immense.
Individual TCP connections don't need to live that long. Once a macOS system reaches 49.7 days of uptime, this bug starts affecting all TCP connections.
If the count it returns keeps growing, you're seeing a slow leak. At some point, new connections will start failing. How soon depends entirely on how quickly your machine closes new connections.
Since a lot of client traffic involves the server closing connections instead, I imagine it could take a while.
It's unclear if it'll leak whenever your mac closes or only when it fails to get a (FIN, ACK) back from the peer so the TCP_WAIT garbage collector runs. If it's the latter, then it could take substantially longer, depending on connection quality.
You can run `sysctl kern.boottime` to get when it was booted and do the math from there.
I also can't reproduce. I want to say I have encountered this issue at least once, yesterday I before rebooted my uptime was 60 days.
But it's not instant, it just never releases connections. So you can have uptime of 3 years and not run out of connections or run out shortly after hitting that issue.
I'm just going from the bug description in the article, but it seems that depending on your network activity, the exact time you will actually notice an impact could vary quite a bit
if it's in keepalive or retransmission timers, desktop use would mask it completely. browsers reconnect on failure, short-lived requests don't care about keepalives. you'd only notice in things that rely on the OS detecting a dead peer — persistent db connections, ssh tunnels, long-running streams.
> Am I supposed to be having issues with TCP connections right now? (I'm not.)
If my skim read of the slop post is correct, you'll only have issues on that machine if it hasn't spent any of that time asleep. (I have one Macbook that never sleeps, and I'm pretty sure it hit this bug a week or two back.)
I meant that having a connection live that long isn't necessary to trigger this bug. I know that for some workloads, it can be important for connections to live that long.
But for lisp, a more complex solution is needed. It's easy for a human lisp programmer to keep track of which closing parentheses corresponds to which opening parentheses because the editor highlights parentheses pairs as they are typed. How can we give an LLM that kind of feedback as it generates code?
Are they also measuring productivity? Measuring only token costs is like looking only at grocery spend but not the full receipt: you don’t know whether you fed your family for a week or for only a day.
I'm not one of those execs, I'm just echoing what they tell us from those I've talked to who manage these dashboards and worry about this. I do think measuring productivity is not very clear-cut especially with these tools.
They do "attempt" to measure productivity. But they also just see large dollar amounts on AI costs and get wary.
My company is also wary of going all in with any one tool or company due to how quickly stuff changes. So far they've been trying to pool our costs across all tools together and give us an "honor system" limit we should try not to go above per month until we do commit to one suite of tools.
(Output / input), both of which are usually measured in money. If you can measure both of those things--and you have bigger problems if your finance department can't--it logically follows that you can measure productivity.
Measuring strictly in terms of money per unit time over a small enough timeframe is difficult because not all tasks directly result in immediately observed results.
There are tasks worked on at large enterprises that have 5+ year horizons, and those can't all immediately be tracked in terms of monetary gain that can be correlated with AI usage. We've barely even had AI as a daily tool used for development for a few years.
reply