DeepSeek hysteria
One of the advantages of writing a weekly newsletter is that you don't have to react immediately to any breaking news. You have the luxury to think about it and contemplate.
I am sure that by now you've probably heard about this 'new' AI, DeepSeek, coming out of China, or maybe you've heard that Nvidia lost almost $600 billionin one day.
The headlines started pouring in:
Biggest Market Loss In History: Nvidia Stock Sheds Nearly $600 Billion As DeepSeek Shakes AI Darling (Forbes)
Tech billionaires lost almost $100bn in stock market selloff sparked by DeepSeek (The Guardian)
Then the commentators weighed in with their doom & gloom predictions for the demise of all the US AI companies because of their laziness and greed and everything else negative.
And the reactions from people who tried to beat each other to profess their new allegiance flooded the social networks with statements like this:
'... I tried out over the weekend and am a convert. Up until yesterday did somewhat of a comparison w OpenAI and another closed platform I use that allows you to switch between models including DeepSeek. I'm all in w DeepSeek for a number of reasons that make their outputs better; and regardless of accuracy of time spent in dev or amount of H100 (speculated at 50K), they did it faster and cheaper. …'
I love when people convert and have emotional attachments to technology ...
Now that the hysteria has subsided a little bit we can have a more mature discussion about why the introduction of DeepSeek is a good thing.
Before that, here is a quick overview of DeepSeek. The links will provide in-depth discussion especially around the technical details.
DeepSeek is a company based in China and it was started by quantitative hedge fund High-Flyer with some billions of dollars under management. This New York Times article How Chinese A.I. Start-Up DeepSeek Is Competing With Silicon Valley Giants provides coverage about the founder and his hedge fund company.
What is so special about DeepSeek and why suddenly everyone is talking about DeepSeek? It was trained for far less (magnitudes less) money. It was trained on less capable hardware since the US is restricting access to the most advanced technology to China. It is far cheaper to use. And it is made available as an open source for everyone to use. Here is an intro on how to install it on your local machine in 3 minutes. BTW: The training cost is an estimate and nobody knows the cost of the actual development. The number came out of the white paper discussing the architecture, the methods of training and the math behind. The amount the company is charging is very low, but there is no indication if that even covers the cost of operating the infrastructure or is subsidized. So, no, we don’t know the actual cost to get it to the market.
Why the panic? The argument goes like this - an unknown company with a minuscule budget and average hardware was able to create an AI which rivals the capability of big companies with billions of dollars. These companies claimed that without mega billions investment and the most advanced hardware, nobody can achieve what they did. Plus this tiny Chinese company is giving away its product away for free.
The market is freaking out, the sky is falling and the AI bubble is bursting. And people are converting before they are caught still using ChatGPT.
Here are a few things to consider before we follow them.
This is not the first Large Language Model (LLM) released as open source. Facebook/Meta did that a long time ago to make sure that there is an alternative to OpenAI's ChatGPT. Since then, OpenAI raised billions of dollars and Nvidia increased its market cap. The sky didn't fall.
The questions which are now coming out are more relevant.
How did this tiny company come up with something new and amazing? Well, it is not as new as one would think. As mentioned, Facebook/Meta LLM is in the public domain, and the papers describing the technology which came out of Google are on the Internet for anyone to read and try. Perhaps it was the constraints which forced the DeepSeek team to come up with a better way of doing things.
Next question. What was the content DeepSeek used for training? OpenAI, in its white paper about GPT-3, outlined which content it used for training and the quantity of tokens - Common Crawl (filtered) (410 billion), WebText2 (19 billion), Books1 (12 billion), Books2 (55 billion) and Wikipedia (3 billion). Ever since OpenAI released its ChatGPTs, the outcry about copyright, stolen content and murdering little children was loud and clear. It forced OpenAI (and all the others) to start negotiating licensing deals with some of the content creators. At least the ones which threaten them with lawsuits. Somehow this question didn't come up in the discussion about DeepSeek. Why would we discuss something like that when the product is so cheap?
Not to be outdone, the star of DeepSeek is getting outshined by Alibaba and ByteDance, the owner of TikTok, which both claim their models are better.
Side note: Mr. Altman from OpenAI is getting emotional and floating an idea that DeepSeek was using OpenAI data for its own training. #chokingonirony
The next question is gaining momentum but it is mostly dismissed as 'you know these cute commies, they are touchy about their triple T, but who cares about that.’ What is triple T? Tibet, Taiwan and Tiananmen Square. For good measures you can also add Winnie the Poo if you really want to go to jail. Few news organizations went straight to the point and tested it. The Guardian with We tried out DeepSeek. It worked well, until we asked it about Tiananmen Square and Taiwan and CBC with DeepSeek seems to struggle with questions that would upset Chinese authorities. Perhaps you don't care about the Chinese Triple T, but what you should care about is that this even exists. And it is not only DeepSeek, as OpenAI is also on record that it is filtering out uncomfortable questions.
And that should be the moment when you pause and realize that none of the companies, starting with OpenAI, Microsoft, Facebook, all the way to DeepSeek, never revealed what their systems would not answer despite the fact that they were trained on. All of them are guilty of creating this wall between you and the original content. You have no idea about any built-in bias or desired answer for any particular question. Mentioning the fact that nobody knows how to make these models ‘forget’ is just a tiny detail that the converts drooling in excitement didn't consider yet.
As a side note, so far humans were able to hack every LLM to make it answer to things which were verboten by their makers.
Also, do you remember how people got upset that OpenAI used their content and questions for further training? According to this report, an estimated 5-6 million users worldwide have downloaded the DeepSeek app from the Apple Store. That doesn't count people using Android phones or using it through a browser. Did anyone mention the famous Chinese (Internet) Wall where every communication is captured?
Here is a final thought to ponder. When any new model is released, it is subjected to tests against known benchmarks. Since they are known, naturally they all are close to or above 90%. There was an exception with the FrontierMath test (and OpenAI cheating on it.) All that is well, but what we don't have is a test which will demonstrate that the LLM doesn't have a backdoor which could be used to trigger hostile activity or that it is resistant to any attack. As just mentioned, so far, every LLM has been hacked, and for your entertainment here is the latest paper Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation with a source codefor an immediate test. In simple terms, get any free LLM, tune to a very specific task and then give it away for free. Once everyone installs it on their machines, just provide a command to do anything you want.
The recurrent patterns in all this? There will be more DeepSeek like models coming up and all the companies will do reverse engineering to make things better, faster, cheaper. Specifically, the cheaper part wasn't really the focus for anyone. For you personally, ponder a question: Why would anyone give away anything for free which when kept secret would make them tons of money? Hedge funds are not known for their charity. And for Nvidia? Its market cap is still around $3 trillion. They will be fine.