Modes Of Introduction
Architecture and Design :
Developers may rely heavily on protection mechanisms such as
input filtering and model alignment, assuming they are more effective
than they actually are.
Implementation :
Developers may rely heavily on protection mechanisms such as
input filtering and model alignment, assuming they are more effective
than they actually are.
Applicable Platforms
Language
Class: Not Language-Specific (Undetermined)
Architectures
Class: Not Architecture-Specific (Undetermined)
Technologies
Name: AI/ML (Undetermined)
Class: Not Technology-Specific (Undetermined)
Common Consequences
Scope |
Impact |
Likelihood |
Integrity | Execute Unauthorized Code or Commands, Varies by Context
Note: In an agent-oriented setting,
output could be used to cause unpredictable agent
invocation, i.e., to control or influence agents
that might be invoked from the output. The impact
varies depending on the access that is granted to
the tools, such as creating a database or writing
files.
| |
Observed Examples
Reference |
Description |
CVE-2024-3402 | chain: GUI for ChatGPT API performs
input validation but does not properly "sanitize"
or validate model output data (CWE-1426), leading
to XSS (CWE-79). |
Potential Mitigations
Phases : Architecture and Design
Since the output from a generative AI component (such as an LLM) cannot be trusted, ensure that it operates in an untrusted or non-privileged space.
Phases : Operation
Use "semantic comparators," which are mechanisms that
provide semantic comparison to identify objects that might appear
different but are semantically similar.
Phases : Operation
Use components that operate
externally to the system to monitor the output and
act as a moderator. These components are called
different terms, such as supervisors or
guardrails.
Phases : Build and Compilation
During model training, use an appropriate variety of good
and bad examples to guide preferred outputs.
Detection Methods
Dynamic Analysis with Manual Results Interpretation
Use known techniques for prompt injection
and other attacks, and adjust the attacks to be more
specific to the model or system.
Dynamic Analysis with Automated Results Interpretation
Use known techniques for prompt injection
and other attacks, and adjust the attacks to be more
specific to the model or system.
Architecture or Design Review
Review of the product design can be
effective, but it works best in conjunction with dynamic
analysis.
Vulnerability Mapping Notes
Rationale : There is potential for this CWE entry to be modified in the future for further clarification as the research community continues to better understand weaknesses in this domain.
Comments :
This CWE entry is only related to "validation" of output and might be used mistakenly for other kinds of output-related weaknesses. Careful attention should be paid to whether this CWE should be used for vulnerabilities related to "prompt injection," which is an attack that works against many different weaknesses. See Maintenance Notes and Research Gaps. Analysts should closely investigate the root cause to ensure it is not ultimately due to other well-known weaknesses. The following suggestions are not comprehensive.
Notes
This entry is related to AI/ML, which is not well
understood from a weakness perspective. Typically, for
new/emerging technologies including AI/ML, early
vulnerability discovery and research does not focus on
root cause analysis (i.e., weakness identification). For
AI/ML, the recent focus has been on attacks and
exploitation methods, technical impacts, and mitigations.
As a result, closer research or focused efforts by SMEs
is necessary to understand the underlying weaknesses.
Diverse and dynamic terminology and rapidly-evolving
technology further complicate understanding. Finally,
there might not be enough real-world examples with
sufficient details from which weakness patterns may be
discovered. For example, many real-world vulnerabilities
related to "prompt injection" appear to be related to
typical injection-style attacks in which the only
difference is that the "input" to the vulnerable
component comes from model output instead of direct
adversary input, similar to "second-order SQL injection"
attacks.
This entry was created by members
of the CWE AI Working Group during June and July 2024. The
CWE Project Lead, CWE Technical Lead, AI WG co-chairs, and
many WG members decided that for purposes of timeliness, it
would be more helpful to the CWE community to publish the
new entry in CWE 4.15 quickly and add to it in subsequent
versions.
References
REF-1441
LLM02: Insecure Output Handling
OWASP.
https://genai.owasp.org/llmrisk/llm02-insecure-output-handling/ REF-1442
Validating Outputs
Cohere, Guardrails AI.
https://cohere.com/blog/validating-llm-outputs REF-1443
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails
Traian Rebedea, Razvan Dinu, Makesh Sreedhar, Christopher Parisien, Jonathan Cohen.
https://aclanthology.org/2023.emnlp-demo.40/ REF-1444
Insecure output handling in LLMs
Snyk.
https://learn.snyk.io/lesson/insecure-input-handling/ REF-1445
Building Guardrails for Large Language Models
Yi Dong, Ronghui Mu, Gaojie Jin, Yi Qi, Jinwei Hu, Xingyu Zhao, Jie Meng, Wenjie Ruan, Xiaowei Huang.
https://arxiv.org/pdf/2402.01822
Submission
Name |
Organization |
Date |
Date Release |
Version |
Members of the CWE AI WG |
CWE Artificial Intelligence (AI) Working Group (WG) |
2024-07-02 +00:00 |
2024-07-16 +00:00 |
4.15 |