When prompts are constructed using externally controllable data, it is often possible to cause an LLM to ignore the original guidance provided by its creators (known as the "system prompt") by inserting malicious instructions in plain human language or using bypasses such as special characters or tags. Because LLMs are designed to treat all instructions as legitimate, there is often no way for the model to differentiate between what prompt language is malicious when it performs inference and returns data. Many LLM systems incorporate data from other adjacent products or external data sources like Wikipedia using API calls and retrieval augmented generation (RAG). Any external sources in use that may contain untrusted data should also be considered potentially malicious.
LLM-connected applications that do not distinguish between trusted and untrusted input may introduce this weakness. If such systems are designed in a way where trusted and untrusted instructions are provided to the model for inference without differentiation, they may be susceptible to prompt injection and similar attacks.
When designing the application, input validation should be applied to user input used to construct LLM system prompts. Input validation should focus on mitigating well-known software security risks (in the event the LLM is given agency to use tools or perform API calls) as well as preventing LLM-specific syntax from being included (such as markup tags or similar).
This weakness could be introduced if training does not account for potentially malicious inputs.
Configuration could enable model parameters to be manipulated when this was not intended.
This weakness can occur when integrating the model into the software.
This weakness can occur when bundling the model with the software.
Scope | Impact | Likelihood |
---|---|---|
Confidentiality Integrity Availability | Execute Unauthorized Code or Commands, Varies by Context Note: The consequences are entirely contextual, depending on the system that the model is integrated into. For example, the consequence could include output that would not have been desired by the model designer, such as using racial slurs. On the other hand, if the output is attached to a code interpreter, remote code execution (RCE) could result. | |
Confidentiality | Read Application Data Note: An attacker might be able to extract sensitive information from the model. | |
Integrity | Modify Application Data, Execute Unauthorized Code or Commands Note: The extent to which integrity can be impacted is dependent on the LLM application use case. | |
Access Control | Read Application Data, Modify Application Data, Gain Privileges or Assume Identity Note: The extent to which access control can be impacted is dependent on the LLM application use case. |
Reference | Description |
---|---|
Chain: LLM integration framework has prompt injection (CWE-1427) that allows an attacker to force the service to retrieve data from an arbitrary URL, essentially providing SSRF (CWE-918) and potentially injecting content into downstream tasks. | |
ML-based email analysis product uses an API service that allows a malicious user to inject a direct prompt and take over the service logic, forcing it to leak the standard hard-coded system prompts and/or execute unwanted prompts to leak sensitive data. | |
Chain: library for generating SQL via LLMs using RAG uses a prompt function to present the user with visualized results, allowing altering of the prompt using prompt injection (CWE-1427) to run arbitrary Python code (CWE-94) instead of the intended visualization code. |
LLM-enabled applications should be designed to ensure proper sanitization of user-controllable input, ensuring that no intentionally misleading or dangerous characters can be included. Additionally, they should be designed in a way that ensures that user-controllable input is identified as untrusted and potentially dangerous.
LLM prompts should be constructed in a way that effectively differentiates between user-supplied input and developer-constructed system prompting to reduce the chance of model confusion at inference-time.
LLM-enabled applications should be designed to ensure proper sanitization of user-controllable input, ensuring that no intentionally misleading or dangerous characters can be included. Additionally, they should be designed in a way that ensures that user-controllable input is identified as untrusted and potentially dangerous.
Ensure that model training includes training examples that avoid leaking secrets and disregard malicious inputs. Train the model to recognize secrets, and label training data appropriately. Note that due to the non-deterministic nature of prompting LLMs, it is necessary to perform testing of the same test case several times in order to ensure that troublesome behavior is not possible. Additionally, testing should be performed each time a new model is used or a model's weights are updated.
During deployment/operation, use components that operate externally to the system to monitor the output and act as a moderator. These components are called different terms, such as supervisors or guardrails.
During system configuration, the model could be fine-tuned to better control and neutralize potentially dangerous inputs.
Use known techniques for prompt injection and other attacks, and adjust the attacks to be more specific to the model or system.
Use known techniques for prompt injection and other attacks, and adjust the attacks to be more specific to the model or system.
Review of the product design can be effective, but it works best in conjunction with dynamic analysis.
Name | Organization | Date | Date Release | Version |
---|---|---|---|---|
Max Rattray | Praetorian | 4.16 |