Tesla et xAI, les entreprises d’Elon Musk, mettront en ligne d’ici la fin de l’année une capacité de calcul de formation de 10 milliards de dollars, mais cela pourrait indiquer qu’elles sont en retard par rapport à l’échéancier fixé par Musk. xAI a lancé son supercalculateur Memphis incluant 100 000 GPU H100, tandis que Tesla a dévoilé son cluster Cortex, comprenant 50 000 GPU H100 et des puces Dojo. Les coûts cumulés pour ces nouvelles infrastructures sont estimés à bien plus de 10 milliards.
Tesla and xAI, two companies founded by Elon Musk, are projected to bring online a remarkable $10 billion in training computing power by the year’s end, according to Sawyer Merritt, co-founder of TwinBirch and Tesla investor. However, it appears that both organizations may fall slightly behind the aggressive timetable Musk has set.
Recently, Musk and his companies have made significant announcements regarding AI supercomputers, marking these as substantial financial undertakings.
In July, xAI commenced AI training with the Memphis Supercluster, which is set to feature 100,000 liquid-cooled H100 GPUs. The demand for energy from this system is staggering, requiring at least 150 MW, with the 100,000 H100 GPUs consuming approximately 70 MW alone. While the complete cost of the system remains unclear, the GPUs alone would amount to about $2 billion if purchased at $20,000 each, typically constituting half of the total system expense.
In late August, Tesla revealed its Cortex AI cluster, boasting an impressive configuration of 50,000 Nvidia H100 GPUs alongside 20,000 custom Dojo AI chips. This Dojo cluster is crucial for advancing Tesla’s full self-driving (FSD) capabilities, making it a key asset for the company.
When considering the costs, the H100-based machine is estimated to be around $2 billion, while the Dojo supercomputer is thought to require at least $1 billion. This latter figure could be an underestimate since Dojo machines are tailored specifically for Tesla’s needs. Each Dojo D1 cabinet, for instance, consumes over 200 kW, necessitating a bespoke cooling distribution unit and power supply, thus elevating costs significantly.
By early September, xAI also commenced operations with its Colossus supercomputer, which integrates 100,000 H100 GPUs and is anticipated to add an additional 50,000 H100 and 50,000 H200 GPUs in the months ahead. This massive AI supercomputer will also incur billions in costs.
Between them, xAI and Tesla are likely to have announced expenditures exceeding $10 billion on AI hardware this year. However, the installation and activation of these AI servers will take time, making it difficult to calculate the total expenditure on operational AI hardware by both firms in 2024.
The irony of this substantial spending is that it seems to lag behind the ambitious goals set by Elon Musk in April, where he stated that Tesla alone intended to invest $10 billion in AI hardware this year.
In a post on X, Musk stated, ‘Tesla will spend around $10 billion this year on combined training and inference AI, primarily for vehicles. Any company not investing at this level and doing so efficiently will struggle to compete.’
While Tesla’s Cortex AI cluster is indeed a costly project, it is expected that it may become even more expensive if the company opts to expand with additional Dojo or Nvidia-based systems. However, it is unlikely that the overall expenses exceed $5 billion. Furthermore, when considering the costs related to AI inference hardware in cars, it seems implausible that the computing hardware for vehicles being produced by Tesla this year would reach $5 billion.
———————————
Tesla et xAI, les entreprises d’Elon Musk, devraient mettre en ligne une capacité de calcul de formation d’une valeur de 10 milliards de dollars d’ici la fin de l’année, selon Sawyer Merritt, cofondateur de TwinBirch et investisseur Tesla. Pourtant, cela indique probablement que les deux sociétés seront quelque peu en retard par rapport au calendrier ambitieux défini par Musk.
Récemment, Musk et ses sociétés ont fait des annonces importantes concernant des superordinateurs d’IA, ce qui représente effectivement des investissements considérables.
En juillet, xAI a commencé l’entraînement d’IA avec le Memphis Supercluster, qui intégrera 100 000 GPU H100 refroidis par liquide. Ce système nécessite une quantité d’énergie énorme, consommant au moins 150 MW, les 100 000 GPU H100 représentant environ 70 MW à eux seuls. Le coût total du système reste inconnu, bien que les GPU seuls coûteraient environ 2 milliards de dollars (s’ils étaient achetés à 20 000 dollars chacun), et généralement, les GPU d’IA représentent la moitié des coûts totaux du système.
Fin août, Tesla a dévoilé son cluster Cortex AI, équipé d’une impressionnante configuration de 50 000 GPU Nvidia H100 et 20 000 puces AI Dojo de fabrication maison. Ce cluster Dojo est crucial pour le développement des capacités de conduite autonome (FSD) de Tesla, le