org.aitools.programd.util
Class XMLKit

java.lang.Object
  extended by org.aitools.programd.util.XMLKit

public class XMLKit
extends Object

A collection of XML utilities.


Field Summary
static String CDATA_END
          CDATA end marker.
static String CDATA_START
          CDATA start marker.
static String COMMENT_END
          Comment end marker.
static String COMMENT_START
          Comment start marker.
protected static String EQUAL_QUOTE
          A common string we search for when parsing attributes in tags.
protected static char QUOTE_MARK
          A quote mark, for convenience.
protected static DocumentBuilder utilBuilder
          A DocumentBuilder for producing new documents.
protected static Document utilDoc
          A document for producing new elements.
protected static String WHITESPACE_REGEX
          The regex for whitespace.
 
Constructor Summary
XMLKit()
           
 
Method Summary
static String convertXMLUnicodeEntities(String input)
           Converts XML Unicode character entities into their character equivalents within a given string.
static int elementCount(NodeList list)
          Returns the number of elements in the nodelist and its descendants.
static String escapeXMLChars(char[] ch, int start, int length)
          Like escapeXMLChars(String), but takes an array of chars instead of a String.
static String escapeXMLChars(String input)
           Replaces the following characters with their "escaped" equivalents:
static String[] filterViaHTMLTags(String input)
           Breaks a message into multiple lines at an HTML <br/>, except if it comes at the beginning of the message, or ending HTML </p>.
static String filterWhitespace(String input)
           Filters all whitespace: line separators and multiple consecutive spaces are replaced with a single space, and any leading or trailing whitespace characters are removed.
static String filterXML(String input)
          Removes all characters that are not considered XML characters from the input.
static List<Element> getAllElementsNamed(Element element, String name)
          Returns the all elements with the given name that are children of the given element, or null if there is no such element.
static String getChildText(Element element, String childName)
          Gets the text of the named child from of the given element.
static String getDeclaredXMLEncoding(InputStream in)
          Returns the declared encoding string from the XML resource supposedly connected to a given InputStream, or the system default if none is found.
static DocumentBuilder getDocumentBuilder(URL schemaLocation, String schemaDescription)
          Sets up a SAX parser that is schema-aware, processes XIncludes, and is set to use the schema at the given location.
static List<Element> getElementChildrenOf(Element element)
          Returns the element children of the given element.
static Element getFirstElementChildOf(Element element)
          Returns the first element child of the given element.
static Element getFirstElementIn(NodeList list)
          Returns the first element member (if there is one) of the given nodelist.
static Element getFirstElementNamed(Element element, String name)
          Returns the first element with the given name that is a child of the given element, or null if there is no such element.
static SAXParser getSAXParser(URL schemaLocation, String schemaDescription)
          Sets up a SAX parser that is schema-aware, processes XIncludes, and is set to use the schema at the given location.
static Schema getSchema(URL schemaLocation, String schemaDescription)
          Attempts to get the schema at the given location.
static String getSpaces(int count)
           
static Document parseAsDocumentFragment(String text)
           
static String removeMarkup(String input)
          Removes all tags from a string (retains character content of tags, however).
static String renderEmptyElement(Element element, boolean includeNamespaceAttribute)
          Renders a given element as an empty element, including a namespace declaration, if requested.
static String renderEndTag(Element element)
          Renders a given element as an end tag.
static String renderStartTag(Element element, boolean includeNamespaceAttribute)
          Renders a given element as a start tag, including a namespace declaration, if requested.
static String renderStartTag(String elementName, Attributes attributes, boolean includeNamespaceAttribute, String namespaceURI)
          Renders a given element name and set of attributes as a start tag, including a namespace declaration, if requested.
static String renderXML(NodeList list, boolean indent)
          Formats XML from a node list into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).
static String renderXML(NodeList list, boolean includeNamespaceAttribute, boolean indent)
          Formats XML from a node list into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).
static String renderXML(NodeList list, int level, boolean atStart, boolean includeNamespaceAttribute, boolean indent)
          Formats XML from a node list into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).
static String renderXML(String content, boolean includeNamespaceAttribute, boolean indent)
          Formats XML from a single long string into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).
static String unescapeXMLChars(String input)
           Replaces the following "escape" strings with their character equivalents:
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CDATA_START

public static final String CDATA_START
CDATA start marker.

See Also:
Constant Field Values

CDATA_END

public static final String CDATA_END
CDATA end marker.

See Also:
Constant Field Values

COMMENT_START

public static final String COMMENT_START
Comment start marker.

See Also:
Constant Field Values

COMMENT_END

public static final String COMMENT_END
Comment end marker.

See Also:
Constant Field Values

EQUAL_QUOTE

protected static final String EQUAL_QUOTE
A common string we search for when parsing attributes in tags.

See Also:
Constant Field Values

QUOTE_MARK

protected static final char QUOTE_MARK
A quote mark, for convenience.

See Also:
Constant Field Values

WHITESPACE_REGEX

protected static final String WHITESPACE_REGEX
The regex for whitespace.

See Also:
Constant Field Values

utilBuilder

protected static DocumentBuilder utilBuilder
A DocumentBuilder for producing new documents.


utilDoc

protected static Document utilDoc
A document for producing new elements.

Constructor Detail

XMLKit

public XMLKit()
Method Detail

unescapeXMLChars

public static String unescapeXMLChars(String input)

Replaces the following "escape" strings with their character equivalents:

  • &amp; with &
  • &lt; with <
  • &gt; with >
  • &apos; with '
  • &quot; with "

Parameters:
input - the string on which to perform the replacement
Returns:
the string with entities replaced

escapeXMLChars

public static String escapeXMLChars(String input)

Replaces the following characters with their "escaped" equivalents:

  • & with &amp;
  • < with &lt;
  • > with &gt;
  • ' with &apos;
  • " with &quot;

Parameters:
input - the string on which to perform the replacement
Returns:
the string with entities replaced

escapeXMLChars

public static String escapeXMLChars(char[] ch,
                                    int start,
                                    int length)
Like escapeXMLChars(String), but takes an array of chars instead of a String. This might be faster (but should be tested).


filterXML

public static String filterXML(String input)
Removes all characters that are not considered XML characters from the input.

Parameters:
input - the input to filter
Returns:
the input with all non-XML characters removed

convertXMLUnicodeEntities

public static String convertXMLUnicodeEntities(String input)

Converts XML Unicode character entities into their character equivalents within a given string.

This will handle entities in the form &#xxxx; (decimal character code, where xxxx is a valid character code), or &#xxxxx (hexadecimal character code, where xxxx is a valid character code).

Parameters:
input - the string to process
Returns:
the input with all XML Unicode character entity codes replaced

getDeclaredXMLEncoding

public static String getDeclaredXMLEncoding(InputStream in)
                                     throws IOException
Returns the declared encoding string from the XML resource supposedly connected to a given InputStream, or the system default if none is found.

Parameters:
in - the input stream
Returns:
the declared encoding
Throws:
IOException - if there was a problem reading the input stream

parseAsDocumentFragment

public static Document parseAsDocumentFragment(String text)
Parameters:
text - a document fragment
Returns:
a Document created by parsing the given text as a document fragment

renderXML

public static String renderXML(String content,
                               boolean includeNamespaceAttribute,
                               boolean indent)
Formats XML from a single long string into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).

Parameters:
content - the XML content to format
includeNamespaceAttribute - whether to include the namespace attribute
indent - whether to render the string in an indented, multiline fashion
Returns:
the formatted XML

renderXML

public static String renderXML(NodeList list,
                               boolean indent)
Formats XML from a node list into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false). This is a convenience method that assumes that we should include namespace attributes.

Parameters:
list - the list of XML nodes
indent - whether to render the string in an indented, multiline fashion
Returns:
the formatted XML

renderXML

public static String renderXML(NodeList list,
                               boolean includeNamespaceAttribute,
                               boolean indent)
Formats XML from a node list into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).

Parameters:
list - the list of XML nodes
includeNamespaceAttribute - whether to include the namespace attribute
indent - whether to render the string in an indented, multiline fashion
Returns:
the formatted XML

renderXML

public static String renderXML(NodeList list,
                               int level,
                               boolean atStart,
                               boolean includeNamespaceAttribute,
                               boolean indent)
Formats XML from a node list into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).

Parameters:
list - the list of XML nodes
level - the level (for indenting; no meaning if indenting is off)
atStart - whether the whole XML string is at its beginning
includeNamespaceAttribute - whether to include the namespace attribute
indent - whether to render the string in an indented, multiline fashion
Returns:
the formatted XML

filterWhitespace

public static String filterWhitespace(String input)
                               throws StringIndexOutOfBoundsException

Filters all whitespace: line separators and multiple consecutive spaces are replaced with a single space, and any leading or trailing whitespace characters are removed. Any data enclosed in <![CDATA[ ]]> sections, however, is left as-is (including the CDATA markers).

Parameters:
input - the input to filter
Returns:
the input with white space filtered.
Throws:
StringIndexOutOfBoundsException - if there is malformed text in the input.

elementCount

public static int elementCount(NodeList list)
Returns the number of elements in the nodelist and its descendants. Useful for seeing whether there are no elements, only text.

Parameters:
list - a list of nodes
Returns:
the number of elements in the nodelist and its descendants

filterViaHTMLTags

public static String[] filterViaHTMLTags(String input)

Breaks a message into multiple lines at an HTML <br/>, except if it comes at the beginning of the message, or ending HTML </p>. Other tags are just removed.

Generally used to format output nicely for a console.

Parameters:
input - the string to break
Returns:
one line per array item

removeMarkup

public static String removeMarkup(String input)
Removes all tags from a string (retains character content of tags, however).

Parameters:
input - the string from which to remove markup
Returns:
the input without tags

renderStartTag

public static String renderStartTag(Element element,
                                    boolean includeNamespaceAttribute)
Renders a given element as a start tag, including a namespace declaration, if requested.

Parameters:
element - the element to render
includeNamespaceAttribute - whether to include the namespace attribute
Returns:
the rendering of the element

renderStartTag

public static String renderStartTag(String elementName,
                                    Attributes attributes,
                                    boolean includeNamespaceAttribute,
                                    String namespaceURI)
Renders a given element name and set of attributes as a start tag, including a namespace declaration, if requested.

Parameters:
elementName - the name of the element to render
attributes - the attributes to include
includeNamespaceAttribute - whether or not to include the namespace attribute
namespaceURI - the namespace URI
Returns:
the rendering result

renderEmptyElement

public static String renderEmptyElement(Element element,
                                        boolean includeNamespaceAttribute)
Renders a given element as an empty element, including a namespace declaration, if requested.

Parameters:
element - the element to render
includeNamespaceAttribute - whether to include the namespace attribute
Returns:
the result of the rendering

renderEndTag

public static String renderEndTag(Element element)
Renders a given element as an end tag.

Parameters:
element - the element to render
Returns:
the result of the rendering

getSpaces

public static String getSpaces(int count)
Parameters:
count - the number of spaces to return
Returns:
the given number of spaces.

getSAXParser

public static SAXParser getSAXParser(URL schemaLocation,
                                     String schemaDescription)
Sets up a SAX parser that is schema-aware, processes XIncludes, and is set to use the schema at the given location.

Parameters:
schemaLocation - location of the schema to use
schemaDescription - short (one word or so) description of the schema
Returns:
the parser

getDocumentBuilder

public static DocumentBuilder getDocumentBuilder(URL schemaLocation,
                                                 String schemaDescription)
Sets up a SAX parser that is schema-aware, processes XIncludes, and is set to use the schema at the given location.

Parameters:
schemaLocation - location of the schema to use
schemaDescription - short (one word or so) description of the schema
Returns:
the parser

getSchema

public static Schema getSchema(URL schemaLocation,
                               String schemaDescription)
Attempts to get the schema at the given location.

Parameters:
schemaLocation - location of the schema to use
schemaDescription - short (one word or so) description of the schema
Returns:
the schema

getElementChildrenOf

public static List<Element> getElementChildrenOf(Element element)
Returns the element children of the given element.

Parameters:
element - the element whose children are wanted
Returns:
the element children of the given element

getFirstElementChildOf

public static Element getFirstElementChildOf(Element element)
Returns the first element child of the given element.

Parameters:
element - the element whose child is wanted
Returns:
the first element child of the given element

getFirstElementIn

public static Element getFirstElementIn(NodeList list)
Returns the first element member (if there is one) of the given nodelist.

Parameters:
list - the nodes to scan
Returns:
the first element member of the given list

getAllElementsNamed

public static List<Element> getAllElementsNamed(Element element,
                                                String name)
Returns the all elements with the given name that are children of the given element, or null if there is no such element.

Parameters:
element - the element whose children should be examined
name - the name of the element desired
Returns:
the desired elements, or null

getFirstElementNamed

public static Element getFirstElementNamed(Element element,
                                           String name)
Returns the first element with the given name that is a child of the given element, or null if there is no such element.

Parameters:
element - the element whose children should be examined
name - the name of the element desired
Returns:
the desired element, or null

getChildText

public static String getChildText(Element element,
                                  String childName)
Gets the text of the named child from of the given element.

Parameters:
element -
childName -
Returns:
the text of the named child from of the given element