In a recent cybersecurity revelation, Lasso Security researchers uncovered a critical security lapse on Hugging Face, a prominent open-source data science and machine learning platform. The breach exposed over 1,500 API tokens belonging to major tech giants, including Meta, Microsoft, Google, VMware, and others. The exposed tokens, if exploited, could have facilitated supply chain attacks, potentially compromising the data and models of more than 1 million users.
Critical Security Breach
Over 1,500 API tokens from major tech companies were exposed on Hugging Face, granting unauthorized access to 723 organizations' accounts with write permissions—enabling potential data poisoning attacks affecting millions of AI/ML users.
The Scope of the Hugging Face Security Breach
Hugging Face, often dubbed the GitHub for AI enthusiasts, hosts a vast repository of projects, including over 250,000 datasets and more than 500,000 AI models. The researchers at Lasso Security discovered that the exposed API tokens granted unauthorized access to 723 organizations' accounts. In the majority of cases, these tokens had write permissions, allowing attackers to modify files in account repositories.
The severity of the situation becomes apparent when considering the potential consequences of such a breach. With the ability to modify datasets and models, attackers could engage in data poisoning attacks, a significant threat to the AI and ML community. Data poisoning could lead to the manipulation of models, impacting millions of users who rely on these foundational models for various applications.
Critical Vulnerabilities Exposed
Lasso Security's researchers were able to demonstrate the gravity of the situation by gaining full access, both read and write permissions, to organizations such as Meta Llama 2, BigScience Workshop, and EleutherAI. These organizations own models with millions of downloads, making them susceptible to potential exploitation by malicious actors.
Major Organizations Compromised
- Meta: Llama 2 and other foundation models with millions of downloads
- Microsoft: Azure AI and machine learning infrastructure access
- Google: AI research models and datasets
- VMware: Cloud and virtualization platform credentials
- BigScience Workshop: Large language models and research projects
- EleutherAI: Open-source AI models used globally
How the Breach Occurred
The exposed API tokens were discovered through substring searches on the Hugging Face platform, highlighting a vulnerability where developers inadvertently store tokens in variables without securing them when pushing code to public repositories. GitHub, a similar platform, has a Secret Scanning feature to prevent such leaks, and Hugging Face also runs a tool alerting users to exposed API tokens hardcoded into projects.
Vulnerabilities Identified
- Hardcoded API Tokens: Developers accidentally pushing tokens to public repositories
- Write Permissions: Majority of exposed tokens allowed file modification in repositories
- Deprecated org_api Weakness: Legacy tokens still providing read and billing access
- Substring Search Detection: Simple searches revealed exposed credentials
- Supply Chain Risk: Compromised models could affect millions of downstream users
- Private Model Access: Exposed tokens allowed unauthorized downloads of proprietary models
The Threat of Data Poisoning
In addition to the exposed API tokens, researchers identified a weakness in Hugging Face's organization API tokens (org_api), which had already been deprecated. This weakness could be exploited for read access to repositories and billing access to resources. While the write functionality had been disabled, the read functionality remained, allowing researchers to download private models using the exposed org_api tokens.
Data poisoning attacks represent one of the most insidious threats in AI and ML. By modifying training datasets or model weights, attackers can:
- Inject backdoors into AI models that activate under specific conditions
- Introduce biases that skew model outputs and decisions
- Manipulate recommendations and predictions to benefit malicious actors
- Compromise the integrity of research and commercial applications
- Undermine trust in open-source AI/ML ecosystems
Secure Your AI/ML Infrastructure
Protect your organization's AI models, datasets, and API credentials from exposure. Get a comprehensive security assessment from Red Rabbit Security's AI/ML security experts.
Schedule Free Security AssessmentSwift Response and Mitigation
All affected organizations, including major players like Meta, Google, Microsoft, and VMware, were promptly notified by Lasso Security. The companies responded by revoking the exposed tokens and removing the compromised code from their repositories. The swift response helped mitigate potential threats, demonstrating the importance of collaboration between security researchers and organizations in the face of cybersecurity challenges.
Stella Biderman, executive director at EleutherAI, emphasized the importance of ethical hacking in identifying vulnerabilities and highlighted collaborative efforts to enhance security measures. Biderman mentioned a recent collaboration between EleutherAI, Hugging Face, and Stability AI to develop a new checkpointing format aimed at mitigating modifications by potential attackers.
Essential Security Measures for AI/ML Platforms
Despite the rapid response and mitigation efforts, the incident underscores the ongoing challenges in securing AI and ML ecosystems. The potential impact of data poisoning attacks, model manipulation, and unauthorized access to private models raises concerns about the broader implications of security lapses in the rapidly evolving field of artificial intelligence.
Best Practices for Securing AI/ML Infrastructure
- Implement Secret Scanning: Use automated tools to detect and prevent hardcoded credentials
- Never Hardcode API Tokens: Store sensitive credentials in secure vaults or environment variables
- Use Short-Lived Tokens: Implement token rotation and expiration policies
- Apply Principle of Least Privilege: Grant minimum necessary permissions to API tokens
- Monitor Access Patterns: Detect anomalous API usage and unauthorized access attempts
- Enable Multi-Factor Authentication: Require MFA for all platform accounts
- Regular Security Audits: Conduct periodic reviews of exposed repositories and credentials
- Model Integrity Verification: Implement checksums and digital signatures for AI models
- Supply Chain Security: Verify the provenance of third-party models and datasets
- Incident Response Planning: Prepare procedures for rapid token revocation and breach containment
Industry Response and Future Implications
In response to the incident, Hugging Face's co-founder and CEO, Clement Delangue, acknowledged that the exposed tokens resulted from users posting them on various platforms, including the Hugging Face Hub and GitHub. Delangue emphasized the need for users to refrain from publishing tokens on any code hosting platform. He assured users that all identified Hugging Face tokens had been invalidated, and the company was implementing measures to prevent similar issues in the future.
Key Takeaways
The Hugging Face security breach serves as a stark reminder that AI and ML platforms are high-value targets for cyber attackers. With the ability to manipulate models that serve millions of users, the impact of a successful attack extends far beyond a single organization.
Organizations must treat API credentials with the same rigor as production database passwords, implement comprehensive secret management practices, and maintain continuous monitoring for exposed credentials. The rapid industry response demonstrates that collaboration between security researchers and platform providers is essential for maintaining trust in the AI/ML ecosystem.
Conclusion
The exposure of 1,500+ API tokens on Hugging Face highlights critical vulnerabilities in how organizations manage credentials in AI/ML development environments. As artificial intelligence continues to reshape industries, the security of AI platforms, models, and datasets becomes paramount to maintaining trust and integrity in the digital ecosystem.
This incident demonstrates that security in the age of AI requires vigilance not just at the infrastructure level, but throughout the entire development lifecycle—from credential management to model deployment and monitoring.
About Red Rabbit Security: We're a leading cybersecurity firm specializing in AI/ML security, secure development practices, and supply chain protection. Our team of certified experts helps organizations secure their artificial intelligence infrastructure, implement robust credential management, and protect against emerging threats in the rapidly evolving AI landscape.
