Cerebras Systems sets a new high for the most advanced AI models ever created on one device

Cerebras Systems sets a new high for the most advanced AI models ever created on one device ...

Cerebras Systems has set the benchmark for the most advanced AI models ever developed on a single device, which in this case is a huge silicon wafer with hundreds of thousands of cores.

Cerebras makes one large piece out of an 8.5-inch-wide silicon wafer that would normally be divided into hundreds of chips. So the word device will have to do as no one else makes such a massive chip with 850,000 cores and 2.55 trillion transistors.

The advantage of a dinner-plate sized wafer

Cerebras said that a single CS-2 system with one Cerebras wafer can train models with up to 20 billion parameters, a feat that is not possible on any other standard datacenter rack. One of the CS-2 systems fits inside a standard datacenter rack with its 26 inches tall.

Cerebras reduces the system-engineering time required for running large NLP models from months to minutes, eliminating one of the most significant aspects of NLP, namely the partitioning of the model across hundreds or thousands of small graphics processing units.

Transform 2022

Join us at the premier workshop on applied AI for business and technology decision makers on July 19 and nearly from July 20-28.

In a conversation, Andrew Feldman, the CEO of Cerebras Systems, said that building requires about 16 keystrokes.

The disadvantage of using GPUs with AI models

Feldman said that bigger models are shown to be more accurate for NLP. However, few corporations had the expertise and knowledge to do the task of deciphering these large models and propagating them across hundreds or thousands of GPUs, which are the computing rival to Cerebras devices.

he said that every network must be reorganized, redistributed, and all the work done once more for each cluster. If you want to transform even one GPU in that cluster, you have to redo all the work. If you want to take the model to another cluster, you must redo all the work.

Cerebras is democratizing access to some of the world''s largest AI capabilities, according to Feldman.

GSK provides very large datasets through itsgenomic and genetic research, and these datasets require additional equipment to conductmachine learning, according to Kim Branson, the senior vice president of AI and machine learning atGSK. The Cerebras CS-2 is a significant component that allows GSK to develop language models by employing biological datasets at a scale and size previously impossible. These foundational models form the basis of many of our AI systems and play a vital role in the discovery oftransformationalmedicines.

Through the release of version R1.4 of the Cerebras Software Platform, CSoft, these capabilities are enhanced by a combination of the size and computational resources available in the Cerebras Wafer Scale Engine-2 (WSE-2) and the Weight Streaming software architecture extensions.

AI training is simple when a model fits on a single processor. But when a model has either more parameters than it can fit in memory, or a layer requires more compute than a single processor can handle, complexity becomes bigger. The model must be broken up and spread across hundreds or thousands of GPUs. This process is usually painful.

Feldman said the group has taken something that currently takes months to complete, and we''ve made it into 16 keystrokes.

Reducing the need for systems engineers

To make matters worse, the technique is unique to a network compute cluster pair, so the work is not portable to different compute clusters, or across neural networks. It''s entirely bespoke, and it''s why companies publish papers about it when they realize this feat. It''s a huge system-engineering problem, and it''s not something that machine learning experts are trained to do.

Feldman said the announcement enhances access to the largest models for any organization by demonstrating that they are easily and easily training on a single device.

Because propagating a large neural network over a cluster of GPUs is extremely difficult, he said.

He added that it''s a multi-dimensional Tetris issue, where you must divide up compute and memory and communication, and distribute them across hundreds or thousands of graphics processing systems.

The largest processor ever built

The Cerebras WSE-2 is a 56-inch processor that has 2.55 trillion additional transistors and has 100 times as many compute cores as the largest GPU. The size and computational resources on the WSE-2 enable memory (which is used to store parameters) to grow separately from compute. Hence, a single CS-2 can support models with hundreds of billions, even trillions of parameters.

Feldman said, just by means of a reminder, that when we say were high, we have 123 times more cores, 1,000 times more memory, and 12,000 times greater memory bandwidth than a GPU solution. And we developed a weight streaming technique, which allows us to keep memory off when the wafer is split.

According to Feldman, graphic processing units have a fixed amount of memory per GPU. If the model requires more parameters than it is in memory, one must buy additional graphics processors and then spread work over several GPUs. The result is an explosion of complexity. The Cerebras solution is far simpler and elegant: by disaggregating compute from memory, the Weight Streaming architecture allows for models with a large number of parameters to run on a single CS-2.

Revolutionizing setup time and portability

Cerebras is able to support, on a single system, the most important NLP networks, according to Feldman. By enabling these networks on a single CS-2, Cerebras reduces setup time to minutes, enabling model portability. This task would take months of engineering time to achieve on a cluster of hundreds of GPUs.

Cerebras'' ability to introduce large language models to the masses with cost-effective, easy access opens up an exciting new era in AI.It allows organizations that cant spend tens of millions on an easy and inexpensive on-ramp to major league NLP, according to Dan Olds, head research officer at Intersect360 Research. It will be interesting to see the new applications and discoveries CS-2 customers make as they train GPT-3 and GPT-J class models on massive datasets.

Worldwide adoption

Cerebras has grown to include customers in North America, Asia, Europe, and the Middle East. It is delivering AI solutions to a growing growing demand from businesses, governments, and high-performance computing (HPC) sectors. GSK, AstraZeneca, TotalEnergies, Lawrence Livermore National Laboratory, Leibniz Supercomputing Centre, the Edinburgh Parallel Computing Centre (EPCC), the National Energy Technology Laboratory, andTokyo Electron Devices.

According to Feldman, these customers are out there saying really nice things about us. AstraZeneca said training on GPU clusters, we achieved in a few days.

Cerebras is capable of performing work ten times faster than 16 GPUs, according to GSK.

According to Feldman, the amount of compute utilized in these major language models has increased exponentially. These large language models have so far become accessible to a limited portion of the market. We have a move that allows the vast majority of the economy to provide them to any organization with access to the most powerful models.

You may also like: