org.owasp.esapi
Class Encoder

java.lang.Object
  extended byorg.owasp.esapi.Encoder
All Implemented Interfaces:
IEncoder

public class Encoder
extends java.lang.Object
implements IEncoder

Reference implementation of the IEncoder interface. This implementation takes a whitelist approach, encoding everything not specifically identified in a list of "immune" characters. Several methods follow the approach in the Microsoft AntiXSS Library.

The canonicalization algorithm is complex, as it has to be able to recognize encoded characters that might affect downstream interpreters without being told what encodings are possible. The stream is read one character at a time. If an encoded character is encountered, it is canonicalized and pushed back onto the stream. If the next character is encoded, then a intrusion exception is thrown for the double-encoding which is assumed to be an attack. This assumption is a bit aggressive as some double-encoded characters may be sent by ordinary users through cut-and-paste.

If an encoded character is recognized, but does not parse properly, the response is to eat the character, stripping it from the input.

Currently the implementation supports:

Since:
June 1, 2007
Author:
Jeff Williams (jeff.williams .at. aspectsecurity.com) Aspect Security
See Also:
IEncoder

Field Summary
static char[] CHAR_ALPHANUMERICS
          The Constant CHAR_ALPHANUMERICS.
static char[] CHAR_DIGITS
          The Constant CHAR_DIGITS.
static char[] CHAR_LETTERS
          The Constant CHAR_LETTERS.
static char[] CHAR_LOWERS
          The Constant CHAR_LOWERS.
static char[] CHAR_PASSWORD_LETTERS
           
static char[] CHAR_SPECIALS
          The Constant CHAR_SPECIALS.
static char[] CHAR_UPPERS
          The Constant CHAR_UPPERS.
static int ENTITY_ENCODING
           
static int NO_ENCODING
          Encoding types
static int PERCENT_ENCODING
           
static int URL_ENCODING
           
 
Constructor Summary
Encoder()
           
 
Method Summary
 java.lang.String canonicalize(java.lang.String input)
          Simplifies percent-encoded and entity-encoded characters to their simplest form so that they can be properly validated.
 byte[] decodeFromBase64(java.lang.String input)
          Decode data encoded with BASE-64 encoding.
 java.lang.String decodeFromURL(java.lang.String input)
          Decode from URL.
 java.lang.String encodeForBase64(byte[] input, boolean wrap)
          Encode for base64.
 java.lang.String encodeForDN(java.lang.String input)
          Encode data for use in an LDAP distinguished name.
 java.lang.String encodeForHTML(java.lang.String input)
          Encode data for use in HTML content.
 java.lang.String encodeForHTMLAttribute(java.lang.String input)
          Encode data for use in HTML attributes.
 java.lang.String encodeForJavascript(java.lang.String input)
          Encode for javascript.
 java.lang.String encodeForLDAP(java.lang.String input)
          Encode data for use in LDAP queries.
 java.lang.String encodeForSQL(java.lang.String input)
          This method is not recommended.
 java.lang.String encodeForURL(java.lang.String input)
          Encode for use in a URL.
 java.lang.String encodeForVBScript(java.lang.String input)
          Encode data for use in visual basic script.
 java.lang.String encodeForXML(java.lang.String input)
          Encode data for use in an XML element.
 java.lang.String encodeForXMLAttribute(java.lang.String input)
          Encode data for use in an XML attribute.
 java.lang.String encodeForXPath(java.lang.String input)
          This implementation encodes almost everything and may overencode.
static void main(java.lang.String[] args)
           
 java.lang.String normalize(java.lang.String input)
          Normalizes special characters down to ASCII using the Normalizer built into Java.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NO_ENCODING

public static final int NO_ENCODING
Encoding types

See Also:
Constant Field Values

URL_ENCODING

public static final int URL_ENCODING
See Also:
Constant Field Values

PERCENT_ENCODING

public static final int PERCENT_ENCODING
See Also:
Constant Field Values

ENTITY_ENCODING

public static final int ENTITY_ENCODING
See Also:
Constant Field Values

CHAR_LOWERS

public static final char[] CHAR_LOWERS
The Constant CHAR_LOWERS.


CHAR_UPPERS

public static final char[] CHAR_UPPERS
The Constant CHAR_UPPERS.


CHAR_DIGITS

public static final char[] CHAR_DIGITS
The Constant CHAR_DIGITS.


CHAR_SPECIALS

public static final char[] CHAR_SPECIALS
The Constant CHAR_SPECIALS.


CHAR_LETTERS

public static final char[] CHAR_LETTERS
The Constant CHAR_LETTERS.


CHAR_ALPHANUMERICS

public static final char[] CHAR_ALPHANUMERICS
The Constant CHAR_ALPHANUMERICS.


CHAR_PASSWORD_LETTERS

public static final char[] CHAR_PASSWORD_LETTERS
Constructor Detail

Encoder

public Encoder()
Method Detail

canonicalize

public java.lang.String canonicalize(java.lang.String input)
Simplifies percent-encoded and entity-encoded characters to their simplest form so that they can be properly validated. Attackers frequently use encoding schemes to disguise their attacks and bypass validation routines. Handling multiple encoding schemes simultaneously is difficult, and requires some special consideration. In particular, the problem of double-encoding is difficult for parsers, and combining several encoding schemes in double-encoding makes it even harder. Consider decoding
 <
 
or
 %26lt;
 
or
 <
 
. This implementation disallows ALL double-encoded characters and throws an IntrusionException when they are detected. Also, named entities that are not known are simply removed. Note that most data from the browser is likely to be encoded with URL encoding (FIXME: RFC). The web server will decode the URL and form data once, so most encoded data received in the application must have been double-encoded by the attacker. However, some HTTP inputs are not decoded by the browser, so this routine allows a single level of decoding.

Specified by:
canonicalize in interface IEncoder
Parameters:
input - unvalidated input from an HTTP request
Returns:
the canonicalized string
Throws:
IntrusionException
See Also:
org.owasp.esapi.interfaces.IValidator#canonicalize(java.lang.String)

normalize

public java.lang.String normalize(java.lang.String input)
Normalizes special characters down to ASCII using the Normalizer built into Java. Note that this method may introduce security issues if characters are normalized into special characters that have meaning to the destination of the data.

Specified by:
normalize in interface IEncoder
Parameters:
input -
Returns:
See Also:
org.owasp.esapi.interfaces.IValidator#normalize(java.lang.String)

encodeForHTML

public java.lang.String encodeForHTML(java.lang.String input)
Description copied from interface: IEncoder
Encode data for use in HTML content. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is entity-encoded using a whitelist.

Specified by:
encodeForHTML in interface IEncoder
Parameters:
input - the input
Returns:
the string

encodeForHTMLAttribute

public java.lang.String encodeForHTMLAttribute(java.lang.String input)
Description copied from interface: IEncoder
Encode data for use in HTML attributes. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is entity-encoded using a whitelist.

Specified by:
encodeForHTMLAttribute in interface IEncoder
Parameters:
input - the input
Returns:
the string

encodeForJavascript

public java.lang.String encodeForJavascript(java.lang.String input)
Description copied from interface: IEncoder
Encode for javascript. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

Specified by:
encodeForJavascript in interface IEncoder
Parameters:
input - the input
Returns:
the string

encodeForVBScript

public java.lang.String encodeForVBScript(java.lang.String input)
Description copied from interface: IEncoder
Encode data for use in visual basic script. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

Specified by:
encodeForVBScript in interface IEncoder
Parameters:
input - the input
Returns:
the string

encodeForSQL

public java.lang.String encodeForSQL(java.lang.String input)
This method is not recommended. The use PreparedStatement is the normal and preferred approach. However, if for some reason this is impossible, then this method is provided as a weaker alternative. The best approach is to make sure any single-quotes are double-quoted. Another possible approach is to use the {escape} syntax described in the JDBC specification in section 1.5.6 (see http://java.sun.com/j2se/1.4.2/docs/guide/jdbc/getstart/statement.html). However, this syntax does not work with all drivers, and requires modification of all queries.

Specified by:
encodeForSQL in interface IEncoder
Parameters:
input - the input
Returns:
the string
See Also:
IEncoder.encodeForSQL(java.lang.String)

encodeForLDAP

public java.lang.String encodeForLDAP(java.lang.String input)
Description copied from interface: IEncoder
Encode data for use in LDAP queries. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

Specified by:
encodeForLDAP in interface IEncoder
Parameters:
input - the input
Returns:
the string

encodeForDN

public java.lang.String encodeForDN(java.lang.String input)
Description copied from interface: IEncoder
Encode data for use in an LDAP distinguished name. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

Specified by:
encodeForDN in interface IEncoder
Parameters:
input - the input
Returns:
the string

encodeForXPath

public java.lang.String encodeForXPath(java.lang.String input)
This implementation encodes almost everything and may overencode. The difficulty is that XPath has no built in mechanism for escaping characters. It is possible to use XQuery in a parameterized way to prevent injection. For more information, refer to this article which specifies the following list of characters as the most dangerous: ^&"*';<>(). This paper suggests disallowing ' and " in queries.

Specified by:
encodeForXPath in interface IEncoder
Parameters:
input - the input
Returns:
the string
See Also:
IEncoder.encodeForXPath(java.lang.String)

encodeForXML

public java.lang.String encodeForXML(java.lang.String input)
Description copied from interface: IEncoder
Encode data for use in an XML element. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist. The implementation should follow the XML Encoding Standard from the W3C.

The use of a real XML parser is strongly encouraged. However, in the hopefully rare case that you need to make sure that data is safe for inclusion in an XML document and cannot use a parse, this method provides a safe mechanism to do so.

Specified by:
encodeForXML in interface IEncoder
Parameters:
input - the input
Returns:
the string

encodeForXMLAttribute

public java.lang.String encodeForXMLAttribute(java.lang.String input)
Description copied from interface: IEncoder
Encode data for use in an XML attribute. The implementation should follow the XML Encoding Standard from the W3C. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

The use of a real XML parser is highly encouraged. However, in the hopefully rare case that you need to make sure that data is safe for inclusion in an XML document and cannot use a parse, this method provides a safe mechanism to do so.

Specified by:
encodeForXMLAttribute in interface IEncoder
Parameters:
input - the input
Returns:
the string

encodeForURL

public java.lang.String encodeForURL(java.lang.String input)
                              throws EncodingException
Description copied from interface: IEncoder
Encode for use in a URL. This method performs URL encoding" on the entire string. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

Specified by:
encodeForURL in interface IEncoder
Parameters:
input - the input
Returns:
the string
Throws:
EncodingException

decodeFromURL

public java.lang.String decodeFromURL(java.lang.String input)
                               throws EncodingException
Description copied from interface: IEncoder
Decode from URL. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is decoded using URL decoding.

Specified by:
decodeFromURL in interface IEncoder
Parameters:
input - the input
Returns:
the string
Throws:
EncodingException

encodeForBase64

public java.lang.String encodeForBase64(byte[] input,
                                        boolean wrap)
Description copied from interface: IEncoder
Encode for base64.

Beware double-encoding, as this will corrupt the results and could possibly cause a downstream security mechansim to make a mistake.

Specified by:
encodeForBase64 in interface IEncoder
Parameters:
input - the input
Returns:
the string

decodeFromBase64

public byte[] decodeFromBase64(java.lang.String input)
                        throws java.io.IOException
Description copied from interface: IEncoder
Decode data encoded with BASE-64 encoding.

Beware double-encoded data, as the results of this method could still contain encoded characters as part of attacks.

Specified by:
decodeFromBase64 in interface IEncoder
Parameters:
input - the input
Returns:
the byte[]
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

main

public static void main(java.lang.String[] args)