XML
See the following sections to parse XML input or format XML output.
- Setting Stream Properties
- Configuring Prefixes and Namespaces
- Referencing External Entities
- Defining Stream Fields
- Outputting an Empty Stream
Setting XML Stream Properties
You can set the following properties to parse the format of XML input or to format XML output. Select the necessary component and expand the Format property in the Stream pane.
| Property name | Value | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DTD Identifier | Specify the system identifier. This replaces the system identifier when the stream reads external XML outside the Flow Service. For example, the part between quotes is the system identifier:
<!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
||||||||||||
| Read External Entity | Specify whether to load external entities or not. See Referencing External Entities.
|
||||||||||||
| Validate DTD | Specify whether to validate the XML stream against the DTD. For an external DTD, see Referencing External Entities.
|
||||||||||||
| Output Encoding | Select the stream's encoding. When reading an external XML document, if the external XML specifies the encoding, this property is not used.
|
||||||||||||
| Linefeed | Specify the line-feed characters in the XML output. ※CR can't be specified.
|
||||||||||||
| Output Format | Specify how to format the output XML.
|
||||||||||||
| Write XML Declaration | Select whether to output the XML declaration.
|
||||||||||||
| Use Empty Tag | Specify how to output empty elements.
|
||||||||||||
| Namespace | Define a namespace for a prefix (see below). |
Configuring Prefixes and Namespaces
When specifying a prefix for element and attribute names, you must define a namespace to avoid an error. To add a namespace, click the field for the Namespace property, then specify the Prefix and its URI.
Referencing External Entities
You can use a relative URI to reference a DTD or external entity. The relative URI you specify will be resolved using a standard DTD folder for the system. The default DTD folder for the system is [DATA_DIR]/system/schema.
Example
For example, to use the relative path below, you would need to put the file test.dtd in the system DTD folder.
<!DOCTYPE test SYSTEM "test.dtd">
Defining the Stream Fields
See below to map the stream's fields to values in the XML.
- Setting the Field Properties
- Defining the XML Structure
- Defining the Field Names
- Extracting Field Values
- Determining the Record Numbers
Setting the Field Properties
In the Stream pane, you define the fields to match the structure of the XML.
| FieldName | Specify the name of the element or attribute. |
|---|---|
| Type | Select String, Boolean, Integer, Double, Decimal, or Datetime. (You can't specify the binary type here.) |
| Repeat | Specify whether the element or attribute can occur multiple times. See Defining the XML Structure below. |
| Node Type | Select Element or Attribute. |
| Label | Enter a display name for use in the mapping window. |
Note
- Define more than one top-level element.
- Set the Repeat property for an element when the same element is defined again at the same level. This will result in a compile-time error. (You can define the same elements at the same level, otherwise.)
- Define the same attributes at the same level.
- Set the Repeat property for the top-level element.
Defining the XML Structure
Right-click a field and then click Ascend Tree or Descend Tree to move a field up or down in the hierarchy.
Defining the Field Names
The field name uniquely identifies the field in the XML. See below for example field definitions, details on validation of field names, and details on processing.
Example Field Definitions
See below for example definitions for elements and attributes.
Defining prefixes: The last field of the example uses the prefix "x". For any given prefix "x", you must define the namespace URI -- click the Namespace field in the Stream pane.
| FieldName | Repeat | Node type | XPath |
|---|---|---|---|
| root | None | Element | /root |
| record | Exist | Element | /root/record |
| attr1 | None | Property | /root/record/@attr1 |
| element1 | None | Element | /root/record/element1[1] |
| element1 | None | Element | /root/record/element1[2] |
| element2 | None | Element | /root/record/element2 |
| x:element3 | Exist | Element | /root/record/element2/x:element3 |
Mapping Fields to XML
The Flow service internally uses XPath expressions to identify the field in the XML. The XPath expression is an absolute path from the document root.
Note
Validating Field Names
Note the conditions for a field name to be valid:
- You can't use the following characters or the single-byte blank space:
!\"#$%&'()=~^|\\@`+*;:{}[],.<>/?\t - There's no limit to the length of the name.
- The name is case sensitive.
- Non-ASCII characters are also supported.
- The field names must be valid XML element names or attribute names.
Extracting Field Values
If a field's node type is an element, the field value maps to the content of the element. If the node type is an attribute, the field value maps to the value of the attribute. If there is no element or attribute that matches the field definition, the field value maps to null.
The field value of an element that has subelements will be a string that contains the subelements' text joined together. For example, if the XML is as below, the <p> element's value will be "abcdefghi".
<p>
abc
<a href="http://foo.bar/">
def
</a>
ghi
</p>
Determining the Record Numbers
The repeat element separates the XML document into records. See below to follow the process and determine the record number.
- If you don't define a repeat element in the field definition, there's only one record.
- If you define only one repeat element, the record number is the number of repetitions of that element.
- If you define multiple repeat elements that do not have a parent-child relationship, the total record number is the product of each repeat element's record number.
- If you define multiple repeat elements that have a parent-child relationship, the parent's record number is the child's record number, and the total record number is the sum of each parent's record number.
Note
Note that the record number of the stream and the record number of the mapper may not match -- the mapper reconstructs the field definitions using only the fields that are being mapped.
Outputting an Empty Stream
An empty XML stream consists of the following XML:
<?xml version="1.0" encoding="utf-8"?> <root/>