Validating XML Files with Schemas |
![]() |
Introduction
What is XML
XML Schemas
Helix Producer Settings Files
Helix Producer XML Schemas
XML Schema Validators
Validating against the Helix Producer XML Schema
Xerces C++
XML Spy
Visual Basic
Warning: This document is work in progress. Some of the examples written for Visual Basic XML Schema validation have not been tested yet. If you find errors or have corrects/feedback, please send to dev@helix-producer.helixcommunity.org.
This document provides information on using XML Schemas to validate Helix Producer job files, audience files and server files. XML Schemas provide a powerful method of validating that all information in a given XML document complies with the specified syntax and structure and that all data values are specified with the correct data types. XML Schemas can also be used to guide user input in graphical user interfaces or provide a generic means of describing the Helix Producer job, audience and server data models.
For background on XML and editing Helix Producer XML settings files, please see Editing Helix Producer XML Settings Files.
Note: This document describes Helix Producer 2.0 XML settings file for version 10 and later of Helix Producer SDK and applications build on that SDK. There are minor differences in these files with respect to XML Schema validation. Please refer to former documentation for information on XML settings files used in Helix Producer 9.1 and earlier.
XML is a text markup language defined by the World Wide Web Consortium ( WC3, www.w3c.org) that provides a means of organizing data in a text formatted file. Because XML is a WC3 recommendation, many applications support reading and writing of XML files. Thus, tools are available to read, write and validate XML files on a wide variety of platforms.
Helix Producer settings files are based on XML standards. Because of this, standards-based tools that manipulate, parse or write XML documents can create Helix DNA Producer settings files. For example, standard XML parsers can be used to parse Helix DNA Producer settings files and load these files into a Document Object Model (DOM) for programmatic manipulation.
XML stands for "Extensible Markup Language". As the name suggests, XML is extensible. In this document, well cover how XML settings files used by Helix DNA Producer can be extended to support new inputs, outputs, prefilters and codecs.
XML Schemas are data definitions for XML documents. XML Schemas are used to define the structure, content and semantics of XML documents. XML Schema is a successor to the Document Type Definition (DTD). XML Schema validation enables the following:
XML Schemas are defined by the WC3. More information about the XML Schemas can be found at http://www.w3.org/XML/Schema. At this site you will find a list of tools that support both XML document validation based on Schemas as well as tools that allow one to edit XML documents based on an XML Schema.
XML Schema Part 0 Primer <http://www.w3.org/TR/xmlschema-0/> from WC3's website is an excellent introduction to XML Schemas provided by the W3C. There are also many good books on the subject. One such book is "XML For the World Wide Web" by Elizabeth Castro. This book is a relatively brief introduction to XML, DTDs, XML Schemas and some supporting technologies like XML Namespaces, XSLT and XPath.
Helix DNA Producer uses three settings files that are based on XML. These are:
Audience and server files are actually a subset of the settings available in a job file. These files are useful a way to store these settings independent of any single input or output and act as building blocks in the creation of job files. Audience and server files are supported by the Helix DNA Producer SDK to make it easy to build support for saving and reading settings in applications built on the Helix DNA Producer platform.
XML Schemas are defined for each of the three settings files: job, audience and server. These files are located in the docs/XMLSchemas directory in the application directory.
XML Schema files are uniquely identified by what is known as a namespace. The following table identifies the namespace for the three XML Schema file types:
File name | Namespace |
job.2.0.xsd | http://ns.real.com/tools/job.2.0 |
audience.2.0.xsd | http://ns.real.com/tools/audience.2.0 |
server.2.0.xsd | http://ns.real.com/tools/server.2.0 |
The namespace is used in an XML file to identify what XML Schema to use when validating. Namespaces provide a mechanism to ensure uniqueness when validating against multiple XML Schemas. The namespaces in the table above double as a URI to enable schema validators to obtain the relevant XML Schema when validating. See Accessing XML Schemas Online below for more information.
Helix DNA Producer XML Schemas can be accessed online. This is useful for XML Schema validators that have the ability to retrieve XML Schemas via HTTP GET requests. Each XML Schema file can be accessed via HTTP in the following ways:
File name | Namespace | Informational URL | Documentation URL | XML Schema URL |
job.2.0.xsd | http://ns.real.com/tools/job.2.0 | http://ns.real.com/tools/job.2.0/ | http://ns.real.com/tools/job.2.0.html | http://ns.real.com/tools/job.2.0.xsd |
audience.2.0.xsd | http://ns.real.com/tools/audience.2.0 | http://ns.real.com/tools/audience.2.0/ | http://ns.real.com/tools/audience.2.0.html | http://ns.real.com/tools/audience.2.0.xsd |
server.2.0.xsd | http://ns.real.com/tools/server.2.0 | http://ns.real.com/tools/server.2.0/ | http://ns.real.com/tools/server.2.0.html | http://ns.real.com/tools/server.2.0.xsd |
That is, in general, by typing in the namespace, you'll get an informational page with links to documentation about the XML Schema and the XML Schema itself. By appending ".html" you will get the documentation for that XML Schema and by appending ".xsd" you'll get the XML Schema itself which can be used to validate the job file. Many applications accept the HTML URL to the XSD file by identifying it in the xsi:schemaLocation attribute of an XML settings file.
The Helix DNA Producer XML Schemas can be used with many tools such as:
The above applications can be used to facilitate management of Helix DNA Producer settings files such as:
A comprehensive list of XML Schema tools can be found on WC3's website at http://www.w3.org/XML/Schema.
This document will discuss three tools used for XML Schema validation; Xerces C++ <http://xml.apache.org/xerces-c/index.html>, Visual Basic programming language and XML Spy <http://www.xmlspy.com/>. Xerces provides an excellent XML parser and validator available as a command line application or an SDK written in C++, Java and PERL. Visual Basic provides powerful tools to parse, manipulate and validate XML documents based on an XML Schema. XML Spy also provides XML parsing and validation support in the form of an excellent GUI for building and validating XML documents.
While Helix DNA Producer itself does not perform XML Schema validation, Helix DNA Producer ships with XML 1.0 compliant XML Schemas for the job, audience and server files. To perform schema validation on Helix DNA Producer settings files, some attributes need to be added to the root element of the settings file in order to identify the XML Schema namespace and file location.
The excerpt from the Helix DNA Producer job file below contains the necessary modifications to enable validating parsers to locate the Helix DNA Producer job file XML Schema:
<job xmlns="http://ns.real.com/tools/job.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://ns.real.com/tools/job.2.0 http://ns.real.com/tools/job.2.0.xsd">
The xmlns attribute in the sample above defines the default XML Namespace for the document. Every XML Schema has an associated namespace that enables validating XML parsers to identify where to find a definition for a given element or attribute. The default namespace is the namespace to validate a given element or attribute that is not explicitly prefixed with a namespace identifier and a colon. For example, the attribute xsi:type is associated with the namespace identifier xsi while the attribute type is associated with the default namespace.
The xsi:schemaLocation attribute in the sample above defines where to find the XML Schema that will be used to validate the XML document. This attribute provides a list of "namespace URI" pairs separated by white space. The URI (Universal Resource Indicator) defines a location for where to find the XML Schema file containing the corresponding namespace. In the example above, the namespace http://ns.real.com/tools/job.2.0 is associated with the file located at http://ns.real.com/tools/job.2.0.xsd. The URI in this case is a URL path, however a URI can be a relative or absolute path as well.
Finally, the xmlns:xsi attribute is required in order for the validating software to find the definition for xsi:schemaLocation. This attribute defines a namespace for elements prefixed with the xsi namespace identifier.
With the above attributes included in a job file, validating XML parsers can locate and validate a job file. The same can be done with audience or server files using the appropriate namespaces in place of the job file namespace. The examples below demonstrate XML Schema validation in the three sample validating XML parsers introduced above.
Xerces C++ is a set of APIs for validating and manipulating XML documents. Xerces provides several command line sample application that performs XML validation. These utilities are included in the Xerces C++ SDK download package from http://xml.apache.org/xerces-c/index.html. In the example below, we use DOMCount which counts the number of elements in an XML document. The usage for the DOMCount utility can be printed using the -? command option as follows:
C:\Xerces>DOMCount.exe -? Usage: DOMCount [options] <XML file | List file> This program invokes the DOMBuilder, builds the DOM tree, and then prints the number of elements found in each XML file. Options: -l Indicate the input file is a List File that has a list of xml files. Default to off (Input file is an XML file). -v=xxx Validation scheme [always | never | auto*]. -n Enable namespace processing. Defaults to off. -s Enable schema processing. Defaults to off. -f Enable full schema constraint checking. Defaults to off. -locale=ll_CC specify the locale, default: en_US. -? Show this help. * = Default if not provided explicitly.
To perform full XML Schema validation, we use the -n, -s and -f command options described above.
C:\Xerces>DOMCount -n -s -f job.2.0.rpjf job.2.0.rpjf: 491 ms (236 elems).
This command loads the job file, locates the XML Schema document identified by the xsi:schemaLocation and then performs validation on the file. In the case above, there were no errors and the number of elements were printed.
If there are any errors, the Xerces parser prints these to the command line and returns. For example, the video codec has allowable values of rvg2svt, rv8 and rv9. If we modify the codec to read 'rm9' and re-run the validation above we get the following:
C:\Xerces>DOMCount -n -s -f job.2.0.rpjf Error at file C:\Xerces\job.2.0.rpjf, line 102, char 46 Message: Datatype error: Type:InvalidDatatypeValueException, Message:Value 'rm 9' is not in enumeration. Errors occurred, no output available
XML Spy is a graphical user interface for editing and validating different types of XML-based documents. To perform XML Schema validation on a job file, make sure the file has the appropriate attributes added to the root element as described above and open the file in XML Spy. XML Spy will locate the XML Schema and perform validation upon opening the document. If there are no errors, the document is opened in a grid/table view. If there are errors, user is taken to the location of the first error and given the opportunity to fix the error before re-validating. For example, if we modify the video codec to 'rm9' as above and open the job file in XML Spy, the we get the following:
Once the error is corrected, the 'Revalidate' button can be pressed to re-check the file and report any additional errors if neccesary.
XML Spy can also be used to edit document. It uses the XML Schema to provide a list of elements that belong at a given location.
Visual Basic provides a number of functions to parse and validate XML documents. Much like Xerces, Visual Basic is a means of programmatically validating and manipulating XML documents. The functionality for parsing and validating XML documents is provided by the Microsoft XML DOM library for Visual Basic 6 and earlier and is a part of the built in functionality in Visual Basic .NET. The following example demonstrates opening and validating an XML document using Microsoft XML DOM from Visual Basic:
Private Sub LoadJob_Click() Dim xmldoc As New MSXML2.DOMDocument30 If xmldoc.Load("C:\job.2.0.rpjf") = True Then ' The document loaded successfully. MsgBox "Your job file has been successfully loaded!" ' Perform some action here Else ' The document failed to load. Display Error. Dim strErrText As String Dim xPE As MSXML2.IXMLDOMParseError ' Obtain the ParseError object Set xPE = xmldoc.parseError With xPE strErrText = "Your XML Document failed to load" & _ "due the following error." & vbCrLf & _ "Error #: " & .errorCode & ": " & xPE.reason & _ "Line #: " & .Line & vbCrLf & _ "Line Position: " & .linepos & vbCrLf & _ "Position In File: " & .filepos & vbCrLf & _ "Source Text: " & .srcText & vbCrLf & _ "Document URL: " & .url End With MsgBox strErrText End If Set xmldoc = Nothing 'Release memory End Sub
The load method loads the job file, locates the XML Schema document identified by the xsi:schemaLocation and then performs validation on the file. In the case above, there were no errors a message box indicating success is displayed. If there are any errors, the an error dialog is displayed with information about the error as shown below. For example, the same error used above (replace rv9 with rm9 in the video codecName property) is introduced to the job file. When the above code is executed, the following dialog is displayed:
[ Section incomplete. Work in progress ]
Helix DNA producer is a plug-in based architecture for encoding. Developers can add new inputs, prefilters, codecs, postfilters and destinations by writing plug-ins and placing these plug-ins in the appropriate directory. To use a custom plug-in, Helix DNA Producer settings files must be modified to identify the type of plug-in and settings required by the plug-in. These modified settings files will not comply with the Helix DNA Producer XML Schemas unless modifications are made.
In order to facilitate validation of settings files containing custom plug-ins, the Helix DNA Producer Job File XML Schema can be extended by adding new types. Types are defined in an XML Schema type library that extends base types defined in the Helix DNA Producer XML Schema. The following base types, with namespace prefixes included, are defined in the Helix DNA Producer XML Schemas and can be used to derive new types for use with Helix DNA Producer settings files:
For example, if a new prefilter plug-in was developed to work with Helix DNA Producer, an XML Schema type library for that prefliter plug-in could be defined as follows:
<?xml version="1.0" encoding="UTF-8"?> <schema targetNamespace="http://ns.acme.com/acme-producer-jobtypes.1.0" xmlns:acme="http://ns.acme.com/acme-producer-jobtypes.1.0" xmlns="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" xml:lang="en"> <annotation> <documentation>Type library for extending Helix DNA Producer Job Schema</documentation> </annotation> <complexType name="acme-prefilter-normalize"> <complexContent> <extension base="rn:prefilter"> <all> <element name="pluginName" default="acme-prefilter-normalize" minOccurs="0"> <complexType> <simpleContent> <extension base="string"> <attribute name="type" type="string" use="required" fixed="string"/> </extension> </simpleContent> </complexType> </element> <element name="enabled" default="true"> <complexType> <simpleContent> <extension base="boolean"> <attribute name="type" type="boolean" use="required" fixed="bool"/> </extension> </simpleContent> </complexType> </element> <element name="peak"> <complexType> <simpleContent> <extension base="int"> <attribute name="type" type="string" use="required" fixed="uint"/> </extension> </simpleContent> </complexType> </element> </all> </extension> </complexContent> </complexType> </schema>
The XML Schema type library above defines a new complex type acme-prefiler-normalize that extends the base type rn:prefilter defined in the job file XML Schema. The rn is the namespace identifier that is defined in the job file schema via the xmlns:rn attribute. The name of the prefilter, acme-prefiler-normalize, is the recommended naming convention for Helix DNA Producer plug-ins.
The targetNamespace in the root element above defines a unique namespace for the XML type library. This namespace must be referenced in an XML document that uses this type library as we will see below.
By itself, this XML Schema is not valid because of the reference to the base type defined in the job file schema. However, when used with in conjunction with the Helix DNA Producer job file schema, the extends the number of types that may be used in the job file it is referenced in.
This additional XML Schema may be referenced in a job file. First, the namespace and a namespace identifier for this XML Schema
<job xmlns="http://ns.real.com/tools/job.2.0" xmlns:acme="http://ns.real.com/tools/jobtypes-acme.1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ns.real.com/tools/job.2.0 job.2.0_generic-notype.xsd http://ns.real.com/tools/jobtypes-acme.1.0 jobtypes-acme.1.0.xsd">
The xmlns:acme
<prefilter xsi:type="acme:acme-prefilter-normalize"> <acme:pluginName>stevemc</acme:pluginName> <acme:enabled>letmein</acme:enabled> <acme:peak>myserver.real.com</acme:peak> </prefilter>