[Survey the application for user-controllable inputs] Using a browser or an automated tool, an attacker follows all public links and actions on a web site. They record all the links, the forms, the resources accessed and all other potential entry-points for the web application.
[Probe entry points to locate vulnerabilities] The attacker uses the entry points gathered in the "Explore" phase as a target list and injects various UTF-8 encoded payloads to determine if an entry point actually represents a vulnerability with insufficient validation logic and to characterize the extent to which the vulnerability can be exploited.
The exact response required from an UTF-8 decoder on invalid input is not uniformly defined by the standards. In general, there are several ways a UTF-8 decoder might behave in the event of an invalid byte sequence:
It is possible for a decoder to behave in different ways for different types of invalid input.
RFC 3629 only requires that UTF-8 decoders must not decode "overlong sequences" (where a character is encoded in more bytes than needed but still adheres to the forms above). The Unicode Standard requires a Unicode-compliant decoder to "...treat any ill-formed code unit sequence as an error condition. This guarantees that it will neither interpret nor emit an ill-formed code unit sequence."
Overlong forms are one of the most troublesome types of UTF-8 data. The current RFC says they must not be decoded but older specifications for UTF-8 only gave a warning and many simpler decoders will happily decode them. Overlong forms have been used to bypass security validations in high profile products including Microsoft's IIS web server. Therefore, great care must be taken to avoid security issues if validation is performed before conversion from UTF-8, and it is generally much simpler to handle overlong forms before any input validation is done.
To maintain security in the case of invalid input, there are two options. The first is to decode the UTF-8 before doing any input validation checks. The second is to use a decoder that, in the event of invalid input, returns either an error or text that the application considers to be harmless. Another possibility is to avoid conversion out of UTF-8 altogether but this relies on any other software that the data is passed to safely handling the invalid data.
Another consideration is error recovery. To guarantee correct recovery after corrupt or lost bytes, decoders must be able to recognize the difference between lead and trail bytes, rather than just assuming that bytes will be of the type allowed in their position.
Weakness Name | |
---|---|
Improper Handling of Alternate Encoding The product does not properly handle when an input uses an alternate encoding that is valid for the control sphere to which the input is being sent. |
|
Encoding Error The product does not properly encode or decode the data, resulting in unexpected values. |
|
Incorrect Behavior Order: Validate Before Canonicalize The product validates input before it is canonicalized, which prevents the product from detecting data that becomes invalid after the canonicalization step. |
|
Incorrect Behavior Order: Validate Before Filter The product validates data before it has been filtered, which prevents the product from detecting data that becomes invalid after the filtering step. |
|
External Control of File Name or Path The product allows user input to control or influence paths or file names that are used in filesystem operations. |
|
Improper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection') The product constructs all or part of a command, data structure, or record using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify how it is parsed or interpreted when it is sent to a downstream component. |
|
Improper Input Validation The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly. |
|
Incorrect Comparison The product compares two entities in a security-relevant context, but the comparison is incorrect, which may lead to resultant weaknesses. |
|
Incomplete Denylist to Cross-Site Scripting The product uses a denylist-based protection mechanism to defend against XSS attacks, but the denylist is incomplete, allowing XSS variants to succeed. |
Name | Organization | Date | Date Release |
---|---|---|---|
CAPEC Content Team | The MITRE Corporation |
Name | Organization | Date | Comment |
---|---|---|---|
CAPEC Content Team | The MITRE Corporation | Updated References | |
CAPEC Content Team | The MITRE Corporation | Updated Example_Instances, Execution_Flow, Mitigations, Skills_Required | |
CAPEC Content Team | The MITRE Corporation | Updated Related_Weaknesses | |
CAPEC Content Team | The MITRE Corporation | Updated Example_Instances, Mitigations |