WEBVTT

1
00:00:00.240 --> 00:00:01.440
- [Announcer] This is "Future Tech",

2
00:00:01.440 --> 00:00:03.810
where each week we
discuss the good, the bad,

3
00:00:03.810 --> 00:00:08.037
and the ugly of where tech
is headed in 2023 and beyond.

4
00:00:08.037 --> 00:00:11.910
- AI companies are getting
sued right and left.

5
00:00:11.910 --> 00:00:13.380
Joining me to talk about the lawsuits

6
00:00:13.380 --> 00:00:15.510
is Gizmodo senior
reporter, Thomas Germain.

7
00:00:15.510 --> 00:00:17.910
Tom, who's getting sued, and why?

8
00:00:17.910 --> 00:00:19.080
- Well, it's just about everyone

9
00:00:19.080 --> 00:00:21.481
who's making a major AI product.

10
00:00:21.481 --> 00:00:24.030
We're talking about
OpenAI, who made Chat GPT.

11
00:00:24.030 --> 00:00:25.731
Google has Bard.

12
00:00:25.731 --> 00:00:27.930
Meta has a couple different ones.

13
00:00:27.930 --> 00:00:29.940
Llama I think is the
one that's best known.

14
00:00:29.940 --> 00:00:31.890
They're all getting sued
for the same reason,

15
00:00:31.890 --> 00:00:34.331
people are saying that they
scraped content off the internet

16
00:00:34.331 --> 00:00:37.410
which violated both people's privacy

17
00:00:37.410 --> 00:00:38.875
and more importantly, copyright law.

18
00:00:38.875 --> 00:00:41.643
Oh, so there are two cases
filed by the same law firm

19
00:00:41.643 --> 00:00:44.610
against OpenAI and Google,

20
00:00:44.610 --> 00:00:45.758
and their argument is essentially

21
00:00:45.758 --> 00:00:48.540
that these companies scraped everything

22
00:00:48.540 --> 00:00:50.341
that was ever posted
on the entire internet.

23
00:00:50.341 --> 00:00:52.759
So these large language models

24
00:00:52.759 --> 00:00:54.787
they need a big data set of information

25
00:00:54.787 --> 00:00:59.787
so that they can predict what
normal humanesque speech is.

26
00:01:00.180 --> 00:01:02.040
And basically what they do

27
00:01:02.040 --> 00:01:04.800
is they go through sites
like Reddit and Twitter,

28
00:01:04.800 --> 00:01:07.140
we don't know exactly where
they're getting the information,

29
00:01:07.140 --> 00:01:09.290
and just download everything
and feed it into the machine.

30
00:01:09.290 --> 00:01:11.268
The highest profile case

31
00:01:11.268 --> 00:01:13.320
comes from comedian Sarah Silverman,

32
00:01:13.320 --> 00:01:14.820
who along with a couple of other people,

33
00:01:14.820 --> 00:01:16.507
is suing Meta and OpenAI

34
00:01:16.507 --> 00:01:19.393
because she and the other plaintiffs say

35
00:01:19.393 --> 00:01:23.130
they ingested their
entire books, essentially,

36
00:01:23.130 --> 00:01:23.963
and are spitting them out.

37
00:01:23.963 --> 00:01:26.340
So it raises some really
interesting copyright questions

38
00:01:26.340 --> 00:01:28.470
about what is and isn't allowed,

39
00:01:28.470 --> 00:01:29.730
and the courts really don't know.

40
00:01:29.730 --> 00:01:31.366
We haven't figured that
out yet as a society.

41
00:01:31.366 --> 00:01:33.056
- And how does she know

42
00:01:33.056 --> 00:01:35.760
that it might have ingested her book?

43
00:01:35.760 --> 00:01:37.890
Does Chat GPT sound like Sarah Silverman?

44
00:01:37.890 --> 00:01:39.939
And was her book just floating around,

45
00:01:39.939 --> 00:01:41.600
readily available on the internet?

46
00:01:41.600 --> 00:01:44.806
I would love if Chat GPT
sounded like Sarah Silverman.

47
00:01:44.806 --> 00:01:45.926
- [Blake] Me too.
- I think

48
00:01:45.926 --> 00:01:47.517
that would be an improvement.

49
00:01:47.517 --> 00:01:50.387
But no, apparently what they
did is they asked Chat GPT

50
00:01:50.387 --> 00:01:54.840
and I guess Meta's Llama for a summary

51
00:01:54.840 --> 00:01:56.820
of Sarah Silverman's
book, and it spat it out.

52
00:01:56.820 --> 00:01:58.349
So apparently it ingested,

53
00:01:58.349 --> 00:02:01.032
or at least Sarah Silverman says,

54
00:02:01.032 --> 00:02:03.183
an entire copy of her whole book.

55
00:02:03.183 --> 00:02:05.850
And that's kind of a weird situation.

56
00:02:05.850 --> 00:02:08.370
Because I can read, as a writer,

57
00:02:08.370 --> 00:02:10.410
I could read Sarah Silverman's
book and write you a summary.

58
00:02:10.410 --> 00:02:11.243
That's perfectly fine.

59
00:02:11.243 --> 00:02:13.680
But is it different when you feed that

60
00:02:13.680 --> 00:02:15.270
into a computer program

61
00:02:15.270 --> 00:02:18.252
and it exists somewhere in
the ether of a database?

62
00:02:18.252 --> 00:02:20.190
Is that violating copyright law?

63
00:02:20.190 --> 00:02:21.023
We don't really know.

64
00:02:21.023 --> 00:02:23.550
Sarah Silverman says
yes, but anybody's game.

65
00:02:23.550 --> 00:02:25.050
We'll have to see what the courts decide.

66
00:02:25.050 --> 00:02:26.610
- What is Google getting sued for?

67
00:02:26.610 --> 00:02:29.280
I know our headline was
"scraping every post

68
00:02:29.280 --> 00:02:30.900
that's ever been made on the internet".

69
00:02:30.900 --> 00:02:33.684
- Yeah, that's literally what
the lawsuit says, essentially.

70
00:02:33.684 --> 00:02:35.047
We spotted last week

71
00:02:35.047 --> 00:02:37.860
that Google updated its privacy policy,

72
00:02:37.860 --> 00:02:40.770
and said basically any public
information on the internet,

73
00:02:40.770 --> 00:02:42.243
anything that gets posted is fair game

74
00:02:42.243 --> 00:02:45.845
for Google to scrape and put
into its various AI systems,

75
00:02:45.845 --> 00:02:50.190
from Google Translate to Bard
to a number of other products.

76
00:02:50.190 --> 00:02:51.330
- That's pretty ballsy of them.

77
00:02:51.330 --> 00:02:52.770
- Yeah, which is interesting.

78
00:02:52.770 --> 00:02:55.500
The lawsuit says Google
doesn't own the whole internet,

79
00:02:55.500 --> 00:02:57.180
but it's complicated, right?

80
00:02:57.180 --> 00:02:59.341
This information is in public,
anyone can go look at it.

81
00:02:59.341 --> 00:03:01.124
But is it okay to take that

82
00:03:01.124 --> 00:03:04.410
and build it into a computer program?

83
00:03:04.410 --> 00:03:05.460
It's kind of weird.

84
00:03:05.460 --> 00:03:08.441
It's not like Bard or Chat GPT have a copy

85
00:03:08.441 --> 00:03:12.324
of Sarah Silverman's book or anything else

86
00:03:12.324 --> 00:03:14.730
inside the database.

87
00:03:14.730 --> 00:03:17.990
But there's some learning,
there's some analysis

88
00:03:17.990 --> 00:03:22.620
from that copyrighted work
that exists inside Chat GPT.

89
00:03:22.620 --> 00:03:25.083
So is that different
from me writing about it

90
00:03:25.083 --> 00:03:28.080
or someone making a painting

91
00:03:28.080 --> 00:03:29.490
because they studied Picasso

92
00:03:29.490 --> 00:03:31.093
and they do something
that looks like cubism?

93
00:03:31.093 --> 00:03:33.240
It's kind of an edge case.

94
00:03:33.240 --> 00:03:35.220
I'm not really sure
what to think about it,

95
00:03:35.220 --> 00:03:37.620
but definitely, a lot
of artists and writers

96
00:03:37.620 --> 00:03:39.924
and creative types are
pretty upset about this.

97
00:03:39.924 --> 00:03:41.332
- Have we seen any decisions

98
00:03:41.332 --> 00:03:44.040
that might set a
precedent for these cases?

99
00:03:44.040 --> 00:03:45.480
- Not that I'm aware of.

100
00:03:45.480 --> 00:03:47.310
It's a pretty new question.

101
00:03:47.310 --> 00:03:49.680
This technology has
been around for a while,

102
00:03:49.680 --> 00:03:52.200
but it's only the past year or so,

103
00:03:52.200 --> 00:03:55.023
or even eight or nine months with Chat GPT

104
00:03:55.023 --> 00:03:57.833
that this stuff has been
in an easily digestible

105
00:03:57.833 --> 00:04:01.371
public-facing program or
app that anyone can go use.

106
00:04:01.371 --> 00:04:06.330
So there's a lot, hundreds
of years of case law

107
00:04:06.330 --> 00:04:08.147
on issues about intellectual property.

108
00:04:08.147 --> 00:04:11.734
But when it comes to,
are you breaking the law

109
00:04:11.734 --> 00:04:14.686
by having your algorithm look at it

110
00:04:14.686 --> 00:04:19.110
and learn things from it, I
think really we have no idea.

111
00:04:19.110 --> 00:04:22.983
And normally, I would assume
that the US court system

112
00:04:22.983 --> 00:04:25.620
would side with big businesses.

113
00:04:25.620 --> 00:04:27.300
But here it's complicated,

114
00:04:27.300 --> 00:04:29.850
because there's two big
businesses on either side.

115
00:04:29.850 --> 00:04:33.499
We've got the publishing
industry and Getty Images,

116
00:04:33.499 --> 00:04:35.640
and then we've got OpenAI and Google.

117
00:04:35.640 --> 00:04:37.155
So which giant corporation,

118
00:04:37.155 --> 00:04:40.020
which industry are the
courts gonna side with here?

119
00:04:40.020 --> 00:04:41.160
I really couldn't tell you.

120
00:04:41.160 --> 00:04:42.480
It's an interesting question.

121
00:04:42.480 --> 00:04:43.609
- That is an interesting question.

122
00:04:43.609 --> 00:04:47.313
To read more about all these
cases, go to gizmodo.com.