Tag: ocp

  • A Multi-Layered Security and Supply Chain Risk Analysis of Hyperscale AI Networking Infrastructure

    A Multi-Layered Security and Supply Chain Risk Analysis of Hyperscale AI Networking Infrastructure

    Executive Summary

    This report provides a comprehensive security analysis of the networking infrastructure in Meta’s 24,576-GPU AI training clusters. The infrastructure is built on Open Compute Project (OCP) principles. It utilizes Meta’s Minahou switch and the Cisco 8501 switch. Both run Meta’s Facebook Open Switching System (FBOSS) network operating system.

    Our analysis concludes that multiple layers of security and regulatory controls are in place. However, their effectiveness is limited. This is due to the nature of open-source development, the complexity of a global semiconductor supply chain, and persistent threats from sophisticated state-sponsored actors.

    Key Findings:

    • Personnel and Regulatory Gaps. The Open Compute Project (OCP) does not perform national security vetting. This responsibility falls to employers like Meta and Cisco. They are governed by U.S. export control laws, such as the Export Administration Regulations (EAR).¹ However, the open-source model of FBOSS and OCP specifications is a deliberate strategy. It places this technology in the public domain, legally exempting it from many of these controls.¹
    • Software as a Primary Attack Vector. The disaggregated model of FBOSS—a suite of applications on a standard Linux OS—shifts the security burden to the operator (Meta). Our analysis of the ecosystem reveals a significant landscape of high-severity vulnerabilities. These exist in vendor software (Cisco, Broadcom) and Meta’s own open-source projects. This indicates the software layer is a probable and high-risk attack vector.
    • Divergent Hardware Security. A notable divergence exists in the stated security postures of the underlying Application-Specific Integrated Circuits (ASICs). Cisco explicitly designs its Silicon One architecture with a foundation of hardware security. This includes a hardware root of trust and secure boot capabilities. In contrast, public documentation for Broadcom’s Tomahawk 5 overwhelmingly prioritizes performance. It has a conspicuous lack of detail on embedded security features.
    • Concentrated Supply Chain and Geopolitical Risk. The entire hardware foundation is fabricated by a single foundry, Taiwan Semiconductor Manufacturing Company (TSMC). TSMC has a documented history of security breaches. Furthermore, the key design and manufacturing entities—Cisco, Broadcom, and Celestica—maintain significant operations in China. They are also active targets of sophisticated Chinese state-sponsored cyber-espionage campaigns. This confluence of factors makes the supply chain the most complex and difficult-to-secure risk domain.

    This report synthesizes these findings into a holistic risk assessment. It concludes that the most acute and probable threats exist at the software and supply chain levels. It also offers strategic recommendations for enhancing security posture through a defense-in-depth approach.

    (more…)