information labs | Podcast Season: 1 - Episode: 6 / Release date: 17-1-2024 - Recording date: 12-12-2023

1:1 with Pamela Samuelson
UC Berkeley School of Law

Key Quotes

"A rule that (...) you have to keep very, very accurate records about what your training datasets are (...) is just (...) impractical if you care about (...) a large number of people instead of a few big companies being able to participate in the (...) generative AI space."
1 / 11
"Data basically is in a certain form in the in-copyright works that are part of the training data but the model does not embody the training data in a recognisable way. (...) It's just not the way we think about the component elements of copyright works."
2 / 11
"If you think [licensing] will mean that authors will be able to continue to make a living, we're talking about really small change here in terms of each author's entitlement. It's not like you're going to get $10,000 or $50,000 a year."
3 / 11
"The collective license idea doesn't pay attention to (...) that we're talking about billions of works, (...) billions of authors, (...) a lot of things that essentially have no commercial value."
4 / 11
"[Collective licensing:] it's so impractical that it's just not really feasible. (...) No question that collecting societies would (...) be the big beneficiaries of this, not the authors."
5 / 11
"If a voluntary licensing regime works (...), I think that's fine. (...) [A] mandate that everything be licensed (...) is kind of unrealistic."
6 / 11
"[Looking at Common Crawl:] there's this problem of changing the rules now. Yesterday, this was completely legal. Today, we decided copyright makes this illegal. (...) That seems a little weird."
7 / 11
"Engineers basically think of in-copyright works as bags of words. They don't think about it in terms of the expressive elements you and I enjoy whenever we pick up a novel or another well-written type of work."
8 / 11
"Another kind of practical consideration is that (...) a lot of the work on the provenance of training datasets focuses on the provenance of a dataset, not necessarily the provenance of each individual item within the training dataset."
9 / 11
"Are you supposed to disclose the big initial dataset or the curated dataset? The curation of that dataset has more claim to being a trade secret than maybe the training dataset more generally."
10 / 11
"If you think of generative AI systems as tools for human creation, then it would make sense that (...) the use of the tool in service of my vision would be something I could claim copyright on."
11 / 11

Watch Episode & Highlights

About Our Guest

Pamela Samuelson | Richard M. Sherman Distinguished Professor of Law and Information - UC Berkeley School of Law
Pamela Samuelson is the Richard M. Sherman Distinguished Professor of Law and Information at UC Berkeley. She is recognized as a pioneer in digital copyright law, intellectual property, cyberlaw and information policy. Professor Samuelson is a director of the internationally-renowned Berkeley Center for Law & Technology. She is co-founder and chair of the board of Authors Alliance, a nonprofit organization that promotes the public interest in access to knowledge. She also serves on the board of directors of the Electronic Frontier Foundation, as well as on the advisory boards for the Electronic Privacy Information Center, the Center for Democracy & Technology, and Public Knowledge. Professor Samuelson has written and published extensively in the areas of copyright, software protection and cyberlaw, with recent publications looking into the possible intersections of generative AI and copyright.