DeepSeek: (As good as a Ferrari, but built from spare parts) – Microsoft expert explains the basis of the immense success

0
1

DeepSeek is being celebrated as the better ChatGPT. An ex-Microsoft employee explains why the new AI chatbot is a bombshell.

A Sputnik Moment– Former Microsoft employee Dave Plummer talks about the Chinese AI chatbot DeepSeek on his YouTube channel Dave’s Garage. He explains what makes DeepSeek so special compared to its competition like ChatGPT.

Like other LLMs, DeepSeek filters its responses to problematic topics and refuses to answer questions about drug prescriptions. In addition, DeepSeek is also subject to state censorship in China. For example, the AI avoids statements about the Tiananmen massacre, the oppression of the Uyghurs in the Xinjiang region or the political independence of Taiwan.

Answers that include the name of the Chinese head of state, Xi Jinping, are also withheld This censorship is actively checked by a Chinese authority , but can be tricked with a suitable question or circumvented by the local operation of the LLM

A Ferrari made from spare parts

DeepSeek has been much cheaper to train than ChatGPT and comparable LLMs. However, the actual factor by which it should be cheaper is controversial.

On his YouTube channel, former Microsoft employee Plummer, like other sources, speaks of under $6 million that is said to have been invested in DeepSeek. That is a fraction of the billions invested in the competitors’ models.Nevertheless, the AI can keep up with flagships like ChatGPT.

Furthermore, the developers of the AI are said to have had no access to the latest Nvidia chips After all, their performance is said to be so integral to the AI boom that Nvidia has become one of the most valuable corporations. And this while the core product for which many know Nvidia, namely gaming graphics cards, only ranks as a minor product for the company.

DeepSeek is like a Ferrari built from spare parts, according to Plummer,

A Ferrari built from spare parts – just as good, but much cheaper.

Like master and apprentice

possible through a different type of training. Like ChatGPT, DeepSeek is a large-language model.However, this is adistilled model (distilled model).

This means that a smaller model is trained using large models to deliver results that are as similar as possible to those of the large models, but with far fewer resources.

So it happens that the huge models still have a larger knowledge base, but the smaller model performs almost as well in most applications.

Plummer compares this in his video as follows:

It’s like a master training his apprentice – the apprentice doesn’t need to know everything, but they can do the job just as well.

One of these masters was Meta’s open-source model Llama, but also OpenAI’s ChatGPT.

This distillation of knowledge makes DeepSeek significantly more resource-efficient It also no longer requires the immense hardware with hundreds of GPUs in huge data centers like the large models.

But this also leads Plummer to the question:

If you can build a Ferrari in your garage out of Chevy parts, what does that mean for the value of a Ferrari?

Remembering the PC revolution

For the expensive original Ferrari, this doesn’t bode well at first.

However, it is an advantage for users that they can run the model locally on their home hardware. Of course, DeepSeek can’t run locally on every small work notebook. For the largest DeepSeek model, Plummer needs an AMD Threadripper with an Nvidia RTX 6000 GPU (48 GB VRAM). Smaller variants would even run on a MacBook Pro.

This development reminds him of the days of the PC revolution.

It reminds me of the early days of PCs – they weren’t as good as mainframes, but they changed the world.

When computers were still mainframe systems (mainframes) for business applications, no one could have imagined that everyone would eventually have one of these at home at an affordable price.

A Sputnik moment

Unlike the PC revolution, however, the geopolitical implications must also be considered today DeepSeek is a Chinese model that is particularly competing with US Silicon Valley companies.

Plummer refers to this as Sputnik moment This is a reference to the Soviet satellite Sputnik, whose launch in 1957 marked the beginning of the Space Race (Space Race) and a new phase of systemic competition between the Soviet Union and the West in the Cold War.

Similarly, the development of DeepSeek is leading to geopolitical tensions that reflect the competition between the USA as a technological hegemon and China as an emerging world power.

This technological competition is not only a battle for innovation, but also a symbol of the systemic rivalry between the USA’s capitalist democracy and China’s state capitalism.