Breach Blog

Published: 08 Dec 2023

Security Breach Alert: Exposed API Tokens Pose Serious Threat to AI and ML Ecosystem

In a recent cybersecurity revelation, Lasso Security researchers uncovered a critical security lapse on Hugging Face, a prominent open-source data science and machine learning platform. The breach exposed over 1,500 API tokens belonging to major tech giants, including Meta, Microsoft, Google, VMware, and others. The exposed tokens, if exploited, could have facilitated supply chain attacks, potentially compromising the data and models of more than 1 million users.

Hugging Face, often dubbed the GitHub for AI enthusiasts, hosts a vast repository of projects, including over 250,000 datasets and more than 500,000 AI models. The researchers at Lasso Security discovered that the exposed API tokens granted unauthorized access to 723 organizations' accounts. In the majority of cases, these tokens had write permissions, allowing attackers to modify files in account repositories.

The severity of the situation becomes apparent when considering the potential consequences of such a breach. With the ability to modify datasets and models, attackers could engage in data poisoning attacks, a significant threat to the AI and ML community. Data poisoning could lead to the manipulation of models, impacting millions of users who rely on these foundational models for various applications.

Lasso Security's researchers were able to demonstrate the gravity of the situation by gaining full access, both read and write permissions, to organizations such as Meta Llama 2, BigScience Workshop, and EleutherAI. These organizations own models with millions of downloads, making them susceptible to potential exploitation by malicious actors.

The exposed API tokens were discovered through substring searches on the Hugging Face platform, highlighting a vulnerability where developers inadvertently store tokens in variables without securing them when pushing code to public repositories. GitHub, a similar platform, has a Secret Scanning feature to prevent such leaks, and Hugging Face also runs a tool alerting users to exposed API tokens hardcoded into projects.

In addition to the exposed API tokens, researchers identified a weakness in Hugging Face's organization API tokens (org_api), which had already been deprecated. This weakness could be exploited for read access to repositories and billing access to resources. While the write functionality had been disabled, the read functionality remained, allowing researchers to download private models using the exposed org_api tokens.

All affected organizations, including major players like Meta, Google, Microsoft, and VMware, were promptly notified by Lasso Security. The companies responded by revoking the exposed tokens and removing the compromised code from their repositories. The swift response helped mitigate potential threats, demonstrating the importance of collaboration between security researchers and organizations in the face of cybersecurity challenges.

Stella Biderman, executive director at EleutherAI, emphasized the importance of ethical hacking in identifying vulnerabilities and highlighted collaborative efforts to enhance security measures. Biderman mentioned a recent collaboration between EleutherAI, Hugging Face, and Stability AI to develop a new checkpointing format aimed at mitigating modifications by potential attackers.

Despite the rapid response and mitigation efforts, the incident underscores the ongoing challenges in securing AI and ML ecosystems. The potential impact of data poisoning attacks, model manipulation, and unauthorized access to private models raises concerns about the broader implications of security lapses in the rapidly evolving field of artificial intelligence.

In response to the incident, Hugging Face's co-founder and CEO, Clement Delangue, acknowledged that the exposed tokens resulted from users posting them on various platforms, including the Hugging Face Hub and GitHub. Delangue emphasized the need for users to refrain from publishing tokens on any code hosting platform. He assured users that all identified Hugging Face tokens had been invalidated, and the company was implementing measures to prevent similar issues in the future.