Never Trust User Inputs -- And AI Isn't an Exception: A Security-First Approach
As AI transforms industries, security remains critical. Discover the importance of a security-first approach in AI development, the risks of open-source tools, and how Tenable's solutions can help protect your systems.
Artificial Intelligence (AI) is transforming industries and starting to be massively adopted by software developers to build core business applications. However, as organizations embrace these advancements, it remains critical to ensure that the security of their users, their data or their underlying infrastructures are not compromised. According to a recent survey conducted by BairesDev, nearly 72% of the software engineers interviewed are leveraging generative artificial intelligence during their development work.
In the world of cybersecurity, one critical rule is “Never trust user inputs." This rule should be in the mind of every developer and should also be extended to AI technologies. AI systems, such as chatbots, act as intermediaries and process and generate outputs based on user inputs. These AI technologies, as an example, should also be treated as a new form of input and subject to the same level of scrutiny and security measures.
This blog post delves into key security concerns, emphasizing the need for a security-first approach.
Lack of security by design in AI toolsAI tools are, most of the time, open-source and ready-to-use software designed to be used locally on the developer’s machine. Many of these tools do not adhere to robust security practices by default, making them susceptible to exploitation. While analyzing some of the most common projects available on GitHub, we discovered that, for example, most of them do not offer any authentication by default, leaving it open to any user accessing it through the network through their embedded dashboards or the APIs provided. The presence of a web interface, API, and the ability to use them with the CLI increases their attack surface.
The exponential market interest in AI-related tools and applications has probably had a negative influence on their development, favoring the emergence of Proof-Of-Concept (POC) software, which is becoming very popular, rather than building battle-tested software.
In this era of cloud infrastructures and the ability to quickly build new services or rely on pre-existing Docker images and expose them on the Internet, it can be highly risky for an organization to let this door open. In such situations, deploying, for example, an internal AI model on a tool lacking proper authentication could have dramatic outcomes. A recent example is when the Ollama tool allowed remote code execution (RCE) without any specific configuration other than having its API exposed.
During our research, we discovered several zero-day vulnerabilities in projects that are very popular in GitHub, such as their stars and forks counters. However, despite many coordinated disclosure attempts, the project's maintainers have not responded in a reasonable amount of time (and sometimes not at all). We think this is evidence of the lack of security maturity in this ecosystem, which seems to advocate for speed of delivery to the detriment of security concerns.
While conducting our research, we found that previous vulnerabilities patches could be bypassed like this NextChat Server-Side Request Forgery (SSRF) vulnerability. Our analysis of a well-known software named Langflow also highlighted a vulnerability in the permission model implementation, allowing a low-privileged user to gain super admin privileges without any interaction.
The risks of relying on third-party LLMsLarge language models (LLMs) require substantial compute and storage resources, making it challenging for many organizations to deploy and maintain them on-premises. Consequently, it is often easier to rely on third-party providers to manage these resource-intensive models to avoid the hassle of managing the underlying infrastructure and focus on the business aspects. However, relying on such third-party services makes trusting these providers with potential critical business data difficult.
The critical risks related to such usage are real and should be handled on different levels :
- Data breach on the provider side: As with any other service, all processed data could be compromised if the provider suffers a data breach. It is crucial to vet third-party providers and ensure they adhere to privacy and data protection policies.
- Credential leakage: Accessing third-party services requires handling credentials and authentication data. As for any secret data, these credentials can be inadvertently leaked in different places such as public Source Code Management (SCM) software or web applications front end.
- Model trustworthiness: Third-party services can provide numerous models to their customers, and it is critical to assess their reliability, safety, and adherence to ethical guidelines, as there is no actual guarantee that they are safe to use.
As organizations embrace these new technologies to enhance their business, they should ensure that their AI governance rules cover these risks.
The perils of inadequate datasetsMore than other technology, AI is built to fully leverage the data that it consumes. One of its goals is to ensure that organizations take full advantage of the data and knowledge gained over the years to help them move quickly in their operating field, take appropriate actions and make decisions in a shorter period of time, with a high level of confidence and accuracy.
The dataset used to train the model should be seen as an input and should be carefully analyzed. Using confidential business information might inadvertently lead to a leak through model outputs, which can cause a significant security breach. Biased data can also result in AI software making unfair or harmful decisions.
A good approach to handle model security is to focus on security considerations based on confidentiality, integrity and availability. Some examples include:
- Datasets should only include data that is safe for exposure to intended users. When possible, using data anonymization techniques can help safeguard sensitive information such as Personal Identifiable Information (PII) and decrease risks of failing to comply with laws and regulations.
- Data collection processes should be properly implemented and monitored to ensure that data comes only from trusted sources, is accessible only to authorized users, and that the model uses the data and operates according to expectations over time.
- Data availability is crucial for the model to be trained on a complete dataset that matches business requirements. Model availability is also a concern for applications that require usage in a synchronous way. The application fallback behavior should be carefully reviewed and tested like any other failure in classic developments.
LLMs introduce new classes of vulnerabilities that traditional security measures may not address properly. The most prevalent AI-related vulnerabilities are prompt injection attacks, model theft and training data poisoning.
Prompt injection attacks involve malicious users crafting inputs to manipulate LLMs into generating harmful or unauthorized outputs. Remember the “Never trust user inputs” cardinal rule? In this case, the LLM will act as a kind of intermediate between the user inputs and the system. This could result in the system producing sensitive information, executing malicious commands, or being an attack vector for other common vulnerabilities like Stored Cross-Site Scripting. As an example, Vanna.AI, a Python-based library designed to simplify SQL queries from natural language inputs, was recently identified as vulnerable to prompt injection attacks and leading to remote code execution on vulnerable systems.
Models should be protected in the same way we protect confidential and business critical data. The first part of this blog post described how easily some AI tools can expose data to unauthorized actors. Applying defense in-depth principles will help minimize intellectual property leakage if model theft occurs. Hardening model security with techniques such as encryption and obfuscation and having proper monitoring in place is crucial.
Finally, AI training data poisoning is a modern supply-chain attack. By altering the data used by the model, attackers can corrupt its behavior and elicit biased or harmful output, leading to direct impacts on the applications using it to achieve business goals.
As for other traditional fields, developers should always stay updated with the latest security guidelines and incorporate strategies from the OWASP Top 10 for LLMs. Techniques such as input validation, anomaly detection, and robust monitoring of the AI ecosystem's behavior can help detect and mitigate potential threats.
Balancing innovation and riskAI technologies are promising and can transform many industries and businesses, offering innovation and efficiency opportunities. However, they represent a huge security challenge at many levels in organizations and this should not be overlooked.
By adopting a security-first approach, following best practices and having robust governance, organizations can harness the power of AI and mitigate the emerging threats related to its adoption.
How Tenable can helpRead more about how we help secure these tools:
- Tenable Web App Scanning provides plugins to detect popular AI and LLM tools' web interfaces and vulnerabilities.
- Tenable Vulnerability Management, Tenable Security Center, Tenable Nessus plugins detect popular AI and LLM tools.
- Tenable Nessus Network Monitor plugins detect popular tools.
- Tenable's researchers help elevate the ecosystem by identifying exposures in third-party AI software and disclose responsibly to the vendors. Among those that have been published are NextChat Server-Side Request Forgery / Cross-Site Scripting and SSRF Security Feature Bypass in Azure AI and ML Studios.