CWE-1427 Software Weakness Details

Name: Improper Neutralization of Input Used for LLM Prompting

The product uses externally-provided data to build prompts provided to large language models (LLMs), but the way these prompts are constructed causes the LLM to fail to distinguish between user-supplied inputs and developer provided system directives.

CWE Description

When prompts are constructed using externally controllable data, it is often possible to cause an LLM to ignore the original guidance provided by its creators (known as the "system prompt") by inserting malicious instructions in plain human language or using bypasses such as special characters or tags. Because LLMs are designed to treat all instructions as legitimate, there is often no way for the model to differentiate between what prompt language is malicious when it performs inference and returns data. Many LLM systems incorporate data from other adjacent products or external data sources like Wikipedia using API calls and retrieval augmented generation (RAG). Any external sources in use that may contain untrusted data should also be considered potentially malicious.

General Informations

Modes Of Introduction

Architecture and Design :

LLM-connected applications that do not distinguish between trusted and untrusted input may introduce this weakness. If such systems are designed in a way where trusted and untrusted instructions are provided to the model for inference without differentiation, they may be susceptible to prompt injection and similar attacks.

Implementation :

When designing the application, input validation should be applied to user input used to construct LLM system prompts. Input validation should focus on mitigating well-known software security risks (in the event the LLM is given agency to use tools or perform API calls) as well as preventing LLM-specific syntax from being included (such as markup tags or similar).

Implementation :

This weakness could be introduced if training does not account for potentially malicious inputs.

System Configuration :

Configuration could enable model parameters to be manipulated when this was not intended.

Integration :

This weakness can occur when integrating the model into the software.

Bundling :

This weakness can occur when bundling the model with the software.

Applicable Platforms

Language

Class: Not Language-Specific (Undetermined)

Operating Systems

Class: Not OS-Specific (Undetermined)

Architectures

Class: Not Architecture-Specific (Undetermined)

Technologies

Name: AI/ML (Undetermined)

Common Consequences

Scope	Impact	Likelihood
Confidentiality Integrity Availability	Execute Unauthorized Code or Commands, Varies by Context Note: The consequences are entirely contextual, depending on the system that the model is integrated into. For example, the consequence could include output that would not have been desired by the model designer, such as using racial slurs. On the other hand, if the output is attached to a code interpreter, remote code execution (RCE) could result.
Confidentiality	Read Application Data Note: An attacker might be able to extract sensitive information from the model.
Integrity	Modify Application Data, Execute Unauthorized Code or Commands Note: The extent to which integrity can be impacted is dependent on the LLM application use case.
Access Control	Read Application Data, Modify Application Data, Gain Privileges or Assume Identity Note: The extent to which access control can be impacted is dependent on the LLM application use case.

Observed Examples

References	Description
CVE-2023-32786	Chain: LLM integration framework has prompt injection (CWE-1427) that allows an attacker to force the service to retrieve data from an arbitrary URL, essentially providing SSRF (CWE-918) and potentially injecting content into downstream tasks.
CVE-2024-5184	ML-based email analysis product uses an API service that allows a malicious user to inject a direct prompt and take over the service logic, forcing it to leak the standard hard-coded system prompts and/or execute unwanted prompts to leak sensitive data.
CVE-2024-5565	Chain: library for generating SQL via LLMs using RAG uses a prompt function to present the user with visualized results, allowing altering of the prompt using prompt injection (CWE-1427) to run arbitrary Python code (CWE-94) instead of the intended visualization code.

Potential Mitigations

Phases : Architecture and Design

LLM-enabled applications should be designed to ensure proper sanitization of user-controllable input, ensuring that no intentionally misleading or dangerous characters can be included. Additionally, they should be designed in a way that ensures that user-controllable input is identified as untrusted and potentially dangerous.

Phases : Implementation

LLM prompts should be constructed in a way that effectively differentiates between user-supplied input and developer-constructed system prompting to reduce the chance of model confusion at inference-time.

Phases : Architecture and Design

LLM-enabled applications should be designed to ensure proper sanitization of user-controllable input, ensuring that no intentionally misleading or dangerous characters can be included. Additionally, they should be designed in a way that ensures that user-controllable input is identified as untrusted and potentially dangerous.

Phases : Implementation

Ensure that model training includes training examples that avoid leaking secrets and disregard malicious inputs. Train the model to recognize secrets, and label training data appropriately. Note that due to the non-deterministic nature of prompting LLMs, it is necessary to perform testing of the same test case several times in order to ensure that troublesome behavior is not possible. Additionally, testing should be performed each time a new model is used or a model's weights are updated.

Phases : Installation // Operation

During deployment/operation, use components that operate externally to the system to monitor the output and act as a moderator. These components are called different terms, such as supervisors or guardrails.

Phases : System Configuration

During system configuration, the model could be fine-tuned to better control and neutralize potentially dangerous inputs.

Detection Methods

Dynamic Analysis with Manual Results Interpretation

Use known techniques for prompt injection and other attacks, and adjust the attacks to be more specific to the model or system.

Dynamic Analysis with Automated Results Interpretation

Use known techniques for prompt injection and other attacks, and adjust the attacks to be more specific to the model or system.

Architecture or Design Review

Review of the product design can be effective, but it works best in conjunction with dynamic analysis.

Vulnerability Mapping Notes

Justification : This CWE entry is at the Base level of abstraction, which is a preferred level of abstraction for mapping to the root causes of vulnerabilities.
Comment : Ensure that the weakness being identified involves improper neutralization during prompt generation. A different CWE might be needed if the core concern is related to inadvertent insertion of sensitive information, generating prompts from third-party sources that should not have been trusted (as may occur with indirect prompt injection), or jailbreaking, then the root cause might be a different weakness.

References

REF-1450

OWASP Top 10 for Large Language Model Applications - LLM01
OWASP.
https://genai.owasp.org/llmrisk/llm01-prompt-injection/

REF-1451

IBM - What is a prompt injection attack?
Matthew Kosinski, Amber Forrest.
https://www.ibm.com/topics/prompt-injection

REF-1452

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz.
https://arxiv.org/abs/2302.12173

Submission

Name	Organization	Date	Date release	Version
Max Rattray	Praetorian	2024-06-21 +00:00	2024-11-19 +00:00	4.16

CWE-1427 Detail

CWE-1427