Yesterday, we examined how DeepSeek has shaken up the AI landscape, particularly regarding copyright and plagiarism.

Simply put, DeepSeek has created AI models that rival the current top-tier U.S. models at a fraction of the cost and computing power. Furthermore, they released the models under an open-source license, making them available to everyone to build upon.

But, while the fallout of this release is still being felt, another question is being raised: Did DeepSeek use OpenAI (or other models) to train theirs?

According to OpenAI, there is evidence that they did. They claim that those connected to DeepSeek used OpenAI’s API to access large amounts of data and then “distill” their model from OpenAI’s output.

President Trump’s AI and Crypto Czar, David Sacks, echoed these claims. In an interview, Sacks said it was “possible” that intellectual property theft had occurred.

However, the response to the claims wasn’t exactly sympathetic.

Still, this raises a question: What is distilling and did DeepSeek do anything unethical or illegal? It depends on who you ask.

The Basics of Distilling

AI distillation is a relatively straightforward process. It allows you to use a larger model, such as GPT-4o or GPT-o1, to train a smaller model. Distillation aims to transfer knowledge from the larger model to the smaller one and have the smaller model learn to craft responses similar to the teacher model.

To be clear, this is a common practice when training AI systems. Since the “student” model doesn’t need to be nearly as large as the teacher, it can be significantly more efficient and cost-effective. It can also create heavily specialized models that do one particular task relatively well.

However, distillation usually takes place in-house. A company like OpenAI might distill one of its larger models to create a more targeted one. OpenAI claims that DeepSeek, without their permission, used one of their models as the teacher model to distill their new one.

If DeepSeek did this, it would violate OpenAI’s terms of service. Their terms forbid:

Attempt to or assist anyone to reverse engineer, decompile or discover the source code or underlying components of our Services, including our models, algorithms, or systems (except to the extent this restriction is prohibited by applicable law).’

However, as we discussed yesterday, enforcing that will be a real challenge. Not only is DeepSeek based in China, but its open-source nature means that it has already been widely distributed. The cat, as they say, is out of the bag.

But even if OpenAI has a point and a legal argument, it isn’t getting much sympathy for it.

Pure Schadenfreude

The irony of OpenAI’s complaint has not been lost on creators. DeepSeek is accused of doing to OpenAI what OpenAI did to millions of creators.

Though distillation describes DeepSeek’s alleged actions, it also describes how OpenAI “distilled” millions of webpages, nearly 200,000 pirated books and other content to create their models.

In short, OpenAI is upset that they were treated the same way they’ve treated human creators.

Even if the courts decide that OpenAI’s actions are entirely legal, the ethics don’t change. OpenAI is still mad that someone allegedly did to them what they did to human creators.

It’s true that GPT-4 cost up to $100 million to train, while DeepSeek only cost $5.6 million. However, the cost of everything GPT-4 was trained on easily exceeds $100 million. The only difference is that the expense is spread among millions of creators.

When you ignore authorship, creativity and intellectual property rights, you create a race to the bottom. If OpenAI ignores human creators, it can make an LLM for $100 million. If DeepSeek ignores OpenAI’s rights, it can make one for $5.6 million.

It’s exploitation all the way down, with diminishing costs and diminishing returns.

Bottom Line

To be clear, I don’t think DeepSeek is a “good guy” in this scenario. It relies just as much on copyright-protected works (including displaying them). It also has significant censorship issues and major privacy concerns.

But it does come across as highly disingenuous to OpenAI to complain about DeepSeek allegedly training off their model. While that may violate OpenAI’s terms of use, it’s virtually identical to what OpenAI (and other companies) did to human-created works.

So, while OpenAI may have a legal argument, it doesn’t have much of an ethical one. I agree that it sucks to have your hard work distilled into an AI model that is meant to replace you. Nearly every author, musician, filmmaker, artist, photographer and journalist can relate to that.

Those creatives are simply giving OpenAI the same comfort that they received when they objected to the use of their work to train AI systems.

While this isn’t a victory for human creators, it is a beautiful moment of irony and one that the term schadenfreude feels all too appropriate.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free



Source link

Leave a Comment

Your email address will not be published. Required fields are marked *