Saturday, January 19, 2013

XML Basics


Part of XML’s popularity is a result of its simplicity. When creating your own XML document, you need to remember only a few rules:

• XML elements are composed of a start tag (like <Name>) and an end tag (like </Name>). Content is placed between the start and end tags. If you include a start tag, you must also include a corresponding end tag. The only other option is to combine the two by creating an empty element, which includes a forward slash at the end and has no content (like <Name />). This is similar to the syntax for ASP.NET controls.

• Whitespace between elements is ignored. That means you can freely use tabs and hard returns to properly align your information.

• You can use only valid characters in the content for an element. You can’t enter special characters, such as the angle brackets (< >) and the ampersand (&), as content. Instead, you’ll have to use the entity equivalents (such as &lt; and &gt; for angle brackets, and &amp; for the ampersand). These equivalents will be automatically converted to the original characters when you read them into your program with the appropriate .NET classes.

• XML elements are case sensitive, so <ID> and <id> are completely different elements. 

• All elements must be nested in a root element. In the SuperProProductList example, the root element is <SuperProProductList>. As soon as the root element is closed, the document is finished, and you cannot add anything else after it. In other words, if you omit the <SuperProProductList> element and start with a <Product> element, you’ll be able to enter information for only one product; this is because as soon as you add the closing </Product>, the document is complete. (HTML has a similar rule and requires that all page content be nested in a root <html> element, but most browsers let you get away without following this rule.)

• Every element must be fully enclosed. In other words, when you open a subelement, you need to close it before you can close the parent. <Product><ID></ID></Product> is valid, but <Product><ID></Product></ID> isn’t. As a general rule, indent when you open a new element, because this will allow you to see the document’s structure and notice if you accidentally close the wrong element first.

• XML documents must start with an XML declaration like <?xml version="1.0"?>. This signals that the document contains XML and indicates any special text encoding. However, many XML parsers work fine even if this detail is omitted. As long as you meet these requirements, your XML document can be parsed and displayed as a basic tree. This means your document is well formed, but it doesn’t mean it is valid. For example, you may still have your elements in the wrong order (for example, <ID><Product></Product></ID>), or you may have the wrong type of data in a given field (for example, <ID>Chair</ID><Name>2</Name>). You can impose these additional rules on your XML documents, as you’ll see later in this chapter when you consider XML schemas. Elements are the primary units for organizing information in XML (as demonstrated with the SuperProProductList example), but they aren’t the only option. You can also use attributes.

No comments:
Write comments
Recommended Posts × +