![](/static/253f0d9b/assets/icons/icon-96x96.png)
![](https://fry.gs/pictrs/image/c6832070-8625-4688-b9e5-5d519541e092.png)
This depends on how transformative the act of encoding the data in an LLM is. If you have overfitting out the ass and the model can recite its training material verbatim then it’s an illegal copy of the training material. If the model can only output content that would be considered transformative if a human with knowledge of the training data created it, then so is the model.
I would be really curious where this data is coming from. https://mastodon-analytics.com shows a very different userbase, which might be due to different definitions of “active”, but what seems really fishy is the 72% increase. What timespan are we looking at here? 72% since 2021?