org.owasp.esapi.interfaces
Interface IEncoder

All Known Implementing Classes:
Encoder

public interface IEncoder

The IEncoder interface contains a number of methods related to encoding input so that it will be safe for a variety of interpreters. To prevent double-encoding, all encoding methods should first check to see that the input does not already contain encoded characters. There are a few methods related to decoding that are used for canonicalization purposes. See the Validator class for more information.

All of the methods here must use a "whitelist" or "positive" security model, meaning that all characters should be encoded, except for a specific list of "immune" characters that are known to be safe.

Since:
June 1, 2007
Author:
Jeff Williams (jeff.williams .at. aspectsecurity.com) Aspect Security

Method Summary
 java.lang.String canonicalize(java.lang.String input)
          This method performs canonicalization on data received to ensure that it has been reduced to its most basic form before validation.
 byte[] decodeFromBase64(java.lang.String input)
          Decode data encoded with BASE-64 encoding.
 java.lang.String decodeFromURL(java.lang.String input)
          Decode from URL.
 java.lang.String encodeForBase64(byte[] input, boolean wrap)
          Encode for base64.
 java.lang.String encodeForDN(java.lang.String input)
          Encode data for use in an LDAP distinguished name.
 java.lang.String encodeForHTML(java.lang.String input)
          Encode data for use in HTML content.
 java.lang.String encodeForHTMLAttribute(java.lang.String input)
          Encode data for use in HTML attributes.
 java.lang.String encodeForJavascript(java.lang.String input)
          Encode for javascript.
 java.lang.String encodeForLDAP(java.lang.String input)
          Encode data for use in LDAP queries.
 java.lang.String encodeForSQL(java.lang.String input)
          Encode for SQL.
 java.lang.String encodeForURL(java.lang.String input)
          Encode for use in a URL.
 java.lang.String encodeForVBScript(java.lang.String input)
          Encode data for use in visual basic script.
 java.lang.String encodeForXML(java.lang.String input)
          Encode data for use in an XML element.
 java.lang.String encodeForXMLAttribute(java.lang.String input)
          Encode data for use in an XML attribute.
 java.lang.String encodeForXPath(java.lang.String input)
          Encode data for use in an XPath query.
 java.lang.String normalize(java.lang.String input)
          Reduce all non-ascii characters to their ASCII form so that simpler validation rules can be applied.
 

Method Detail

canonicalize

public java.lang.String canonicalize(java.lang.String input)
                              throws EncodingException
This method performs canonicalization on data received to ensure that it has been reduced to its most basic form before validation. For example, URL-encoded data received from ordinary "application/x-www-url-encoded" forms so that it may be validated properly.

Canonicalization is simply the operation of reducing a possibly encoded string down to its simplest form. This is important, because attackers frequently use encoding to change their input in a way that will bypass validation filters, but still be interpreted properly by the target of the attack. Note that data encoded more than once is not something that a normal user would generate and should be regarded as an attack.

For input that comes from an HTTP servlet request, there are generally two types of encoding to be concerned with. The first is "applicaton/x-www-url-encoded" which is what is typically used in most forms and URI's where characters are encoded in a %xy format. The other type of common character encoding is HTML entity encoding, which uses several formats:

<
,
u
, and
:
.

Note that all of these formats may possibly render properly in a browser without the trailing semi-colon.

Double-encoding is a particularly thorny problem, as applying ordinary decoders may introduce encoded characters, even characters encoded with a different encoding scheme. For example %26lt; is a < character which has been entity encoded and then the first character has been url-encoded. Implementations should throw an IntrusionException when double-encoded characters are detected.

Note that there is also "multipart/form" encoding, which allows files and other binary data to be transmitted. Each part of a multipart form can itself be encoded according to a "Content-Transfer-Encoding" header. See the HTTPUtilties.getSafeFileUploads() method.

For more information on form encoding, please refer to the W3C specifications.

Parameters:
input - unvalidated input from an HTTP request
Returns:
the canonicalized string
Throws:
IntrusionException - if there is a canonicalization problem
EncodingException

normalize

public java.lang.String normalize(java.lang.String input)
Reduce all non-ascii characters to their ASCII form so that simpler validation rules can be applied. For example, an accented-e character will be changed into a regular ASCII e character.

Parameters:
input -
Returns:

encodeForHTML

public java.lang.String encodeForHTML(java.lang.String input)
Encode data for use in HTML content. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is entity-encoded using a whitelist.

Parameters:
input - the input
Returns:
the string

encodeForHTMLAttribute

public java.lang.String encodeForHTMLAttribute(java.lang.String input)
Encode data for use in HTML attributes. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is entity-encoded using a whitelist.

Parameters:
input - the input
Returns:
the string

encodeForJavascript

public java.lang.String encodeForJavascript(java.lang.String input)
Encode for javascript. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

Parameters:
input - the input
Returns:
the string

encodeForVBScript

public java.lang.String encodeForVBScript(java.lang.String input)
Encode data for use in visual basic script. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

Parameters:
input - the input
Returns:
the string

encodeForSQL

public java.lang.String encodeForSQL(java.lang.String input)
Encode for SQL. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

Parameters:
input - the input
Returns:
the string

encodeForLDAP

public java.lang.String encodeForLDAP(java.lang.String input)
Encode data for use in LDAP queries. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

Parameters:
input - the input
Returns:
the string

encodeForDN

public java.lang.String encodeForDN(java.lang.String input)
Encode data for use in an LDAP distinguished name. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

Parameters:
input - the input
Returns:
the string

encodeForXPath

public java.lang.String encodeForXPath(java.lang.String input)
Encode data for use in an XPath query. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

Parameters:
input - the input
Returns:
the string

encodeForXML

public java.lang.String encodeForXML(java.lang.String input)
Encode data for use in an XML element. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist. The implementation should follow the XML Encoding Standard from the W3C.

The use of a real XML parser is strongly encouraged. However, in the hopefully rare case that you need to make sure that data is safe for inclusion in an XML document and cannot use a parse, this method provides a safe mechanism to do so.

Parameters:
input - the input
Returns:
the string

encodeForXMLAttribute

public java.lang.String encodeForXMLAttribute(java.lang.String input)
Encode data for use in an XML attribute. The implementation should follow the XML Encoding Standard from the W3C. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

The use of a real XML parser is highly encouraged. However, in the hopefully rare case that you need to make sure that data is safe for inclusion in an XML document and cannot use a parse, this method provides a safe mechanism to do so.

Parameters:
input - the input
Returns:
the string

encodeForURL

public java.lang.String encodeForURL(java.lang.String input)
                              throws EncodingException
Encode for use in a URL. This method performs URL encoding" on the entire string. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is encoded using a whitelist.

Parameters:
input - the input
Returns:
the string
Throws:
EncodingException

decodeFromURL

public java.lang.String decodeFromURL(java.lang.String input)
                               throws EncodingException
Decode from URL. This method first canonicalizes and detects any double-encoding. If this check passes, then the data is decoded using URL decoding.

Parameters:
input - the input
Returns:
the string
Throws:
java.io.IOException - Signals that an I/O exception has occurred.
EncodingException

encodeForBase64

public java.lang.String encodeForBase64(byte[] input,
                                        boolean wrap)
Encode for base64.

Beware double-encoding, as this will corrupt the results and could possibly cause a downstream security mechansim to make a mistake.

Parameters:
input - the input
Returns:
the string

decodeFromBase64

public byte[] decodeFromBase64(java.lang.String input)
                        throws java.io.IOException
Decode data encoded with BASE-64 encoding.

Beware double-encoded data, as the results of this method could still contain encoded characters as part of attacks.

Parameters:
input - the input
Returns:
the byte[]
Throws:
java.io.IOException - Signals that an I/O exception has occurred.