The other reason they don’t do it is because many models are trained on a large corpus of pirated texts, and documenting this would be a confession.
Not just in an ‘I scraped the new york times without permission’ kind of way, but in a ‘I illegally downloaded a torrent containing bestsellers from the last 30 years’ kind of way.
The other reason they don’t do it is because many models are trained on a large corpus of pirated texts, and documenting this would be a confession.
Not just in an ‘I scraped the new york times without permission’ kind of way, but in a ‘I illegally downloaded a torrent containing bestsellers from the last 30 years’ kind of way.