Security Breach Alert:Exposed API Tokens Pose Serious Threat to AI and ML Ecosystem

In a recent cybersecurity revelation, Lasso Security researchers uncovered a critical security lapse on Hugging Face, a prominent open-source data science and machine learning platform. The breach exposed over 1,500 API tokens belonging to major tech giants, including Meta, Microsoft, Google, VMware, and others. The exposed tokens, if exploited, could have facilitated supply chain attacks, potentially compromising the data and models of more than 1 million users.

Critical Security Breach

Over 1,500 API tokens from major tech companies were exposed on Hugging Face, granting unauthorized access to 723 organizations' accounts with write permissions—enabling potential data poisoning attacks affecting millions of AI/ML users.

The Scope of the Hugging Face Security Breach

Hugging Face, often dubbed the GitHub for AI enthusiasts, hosts a vast repository of projects, including over 250,000 datasets and more than 500,000 AI models. The researchers at Lasso Security discovered that the exposed API tokens granted unauthorized access to 723 organizations' accounts. In the majority of cases, these tokens had write permissions, allowing attackers to modify files in account repositories.

1,500+

Exposed API Tokens

723

Organizations Affected

1M+

Potential Users at Risk

The severity of the situation becomes apparent when considering the potential consequences of such a breach. With the ability to modify datasets and models, attackers could engage in data poisoning attacks, a significant threat to the AI and ML community. Data poisoning could lead to the manipulation of models, impacting millions of users who rely on these foundational models for various applications.

Critical Vulnerabilities Exposed

Lasso Security's researchers were able to demonstrate the gravity of the situation by gaining full access, both read and write permissions, to organizations such as Meta Llama 2, BigScience Workshop, and EleutherAI. These organizations own models with millions of downloads, making them susceptible to potential exploitation by malicious actors.

Major Organizations Compromised

Meta: Llama 2 and other foundation models with millions of downloads
Microsoft: Azure AI and machine learning infrastructure access
Google: AI research models and datasets
VMware: Cloud and virtualization platform credentials
BigScience Workshop: Large language models and research projects
EleutherAI: Open-source AI models used globally

How the Breach Occurred

The exposed API tokens were discovered through substring searches on the Hugging Face platform, highlighting a vulnerability where developers inadvertently store tokens in variables without securing them when pushing code to public repositories. GitHub, a similar platform, has a Secret Scanning feature to prevent such leaks, and Hugging Face also runs a tool alerting users to exposed API tokens hardcoded into projects.

Vulnerabilities Identified

Hardcoded API Tokens: Developers accidentally pushing tokens to public repositories
Write Permissions: Majority of exposed tokens allowed file modification in repositories
Deprecated org_api Weakness: Legacy tokens still providing read and billing access
Substring Search Detection: Simple searches revealed exposed credentials
Supply Chain Risk: Compromised models could affect millions of downstream users
Private Model Access: Exposed tokens allowed unauthorized downloads of proprietary models

The Threat of Data Poisoning

In addition to the exposed API tokens, researchers identified a weakness in Hugging Face's organization API tokens (org_api), which had already been deprecated. This weakness could be exploited for read access to repositories and billing access to resources. While the write functionality had been disabled, the read functionality remained, allowing researchers to download private models using the exposed org_api tokens.

Data poisoning attacks represent one of the most insidious threats in AI and ML. By modifying training datasets or model weights, attackers can:

Inject backdoors into AI models that activate under specific conditions
Introduce biases that skew model outputs and decisions
Manipulate recommendations and predictions to benefit malicious actors
Compromise the integrity of research and commercial applications
Undermine trust in open-source AI/ML ecosystems

Secure Your AI/ML Infrastructure

Protect your organization's AI models, datasets, and API credentials from exposure. Get a comprehensive security assessment from Red Rabbit Security's AI/ML security experts.

Schedule Free Security Assessment

Swift Response and Mitigation

All affected organizations, including major players like Meta, Google, Microsoft, and VMware, were promptly notified by Lasso Security. The companies responded by revoking the exposed tokens and removing the compromised code from their repositories. The swift response helped mitigate potential threats, demonstrating the importance of collaboration between security researchers and organizations in the face of cybersecurity challenges.

Stella Biderman, executive director at EleutherAI, emphasized the importance of ethical hacking in identifying vulnerabilities and highlighted collaborative efforts to enhance security measures. Biderman mentioned a recent collaboration between EleutherAI, Hugging Face, and Stability AI to develop a new checkpointing format aimed at mitigating modifications by potential attackers.

Essential Security Measures for AI/ML Platforms

Despite the rapid response and mitigation efforts, the incident underscores the ongoing challenges in securing AI and ML ecosystems. The potential impact of data poisoning attacks, model manipulation, and unauthorized access to private models raises concerns about the broader implications of security lapses in the rapidly evolving field of artificial intelligence.

Best Practices for Securing AI/ML Infrastructure

Implement Secret Scanning: Use automated tools to detect and prevent hardcoded credentials
Never Hardcode API Tokens: Store sensitive credentials in secure vaults or environment variables
Use Short-Lived Tokens: Implement token rotation and expiration policies
Apply Principle of Least Privilege: Grant minimum necessary permissions to API tokens
Monitor Access Patterns: Detect anomalous API usage and unauthorized access attempts
Enable Multi-Factor Authentication: Require MFA for all platform accounts
Regular Security Audits: Conduct periodic reviews of exposed repositories and credentials
Model Integrity Verification: Implement checksums and digital signatures for AI models
Supply Chain Security: Verify the provenance of third-party models and datasets
Incident Response Planning: Prepare procedures for rapid token revocation and breach containment

Industry Response and Future Implications

In response to the incident, Hugging Face's co-founder and CEO, Clement Delangue, acknowledged that the exposed tokens resulted from users posting them on various platforms, including the Hugging Face Hub and GitHub. Delangue emphasized the need for users to refrain from publishing tokens on any code hosting platform. He assured users that all identified Hugging Face tokens had been invalidated, and the company was implementing measures to prevent similar issues in the future.

Key Takeaways

The Hugging Face security breach serves as a stark reminder that AI and ML platforms are high-value targets for cyber attackers. With the ability to manipulate models that serve millions of users, the impact of a successful attack extends far beyond a single organization.

Organizations must treat API credentials with the same rigor as production database passwords, implement comprehensive secret management practices, and maintain continuous monitoring for exposed credentials. The rapid industry response demonstrates that collaboration between security researchers and platform providers is essential for maintaining trust in the AI/ML ecosystem.

Conclusion

The exposure of 1,500+ API tokens on Hugging Face highlights critical vulnerabilities in how organizations manage credentials in AI/ML development environments. As artificial intelligence continues to reshape industries, the security of AI platforms, models, and datasets becomes paramount to maintaining trust and integrity in the digital ecosystem.

This incident demonstrates that security in the age of AI requires vigilance not just at the infrastructure level, but throughout the entire development lifecycle—from credential management to model deployment and monitoring.

About Red Rabbit Security: We're a leading cybersecurity firm specializing in AI/ML security, secure development practices, and supply chain protection. Our team of certified experts helps organizations secure their artificial intelligence infrastructure, implement robust credential management, and protect against emerging threats in the rapidly evolving AI landscape.