Part 4

Working with XML in ADO.NET

XML has become the standard format for data storage. More and more data is being stored in the form of XML. It is becoming increasingly important to learn how to process XML documents.

Traditionally XML documents are processed using DOM (document object model), which is a set of classes standardized by W3C (World Wide Web Consortium) . So process an XML document one has to use DOM. Though using DOM is particularly a difficult task, learning any new API ( a set of classes and methods) is adding extra burden. We wish we could process XML document in the same manner as a table in a relational table. Well Microsoft listened to our wish and delivered ADO.NET.

ADO.NET is the new set of classes used to access database from .NET applications. It provides DataSet, which can be used to access XML documents. It allows programmer to read, update and create XML documents with the same set of classes and methods that we use to manipulate relational databases like Oracle and SQL Server. This allows programmer to master one set of classes and methods (ADO.NET) and use it with both relational databases and XML documents.

This article is about using DataSet and related classes to process XML documents. We will work with a simple XML document called PLAYERS.XML, which contains information about players of a Soccer team. This article is divided into different section. In each section we concentrate on one task.

What is XML?

What is ADO.NET?

How to read XML document?

Using schema

Adding elements to XML document using Dataset

What is XML?

XML stands for eXtensible Markup Language. It is about storing data using tags. For example, if you want to store information about a player then you put data related to player using tags as follows:

<player>
  <name>Ronaldo</name>
  <position>Forward</position>
  <age>25</age>
</player>    

Each value is stored with tags. In the above document, we have player tag to store information about player. Each attribute is stored again with a tag - name, position and age.

If you want to store details of multiple players then the same can be extended as follows:


<players>
  <player>
      <name>Ronaldo</name>
      <position>Forward</position>
      <age>25</age>
  </player>    
  <player>
      <name>Roberto Carlos</name>
      <position>Defender</position>
      <age>29</age>
  </player>    
  <player>
      <name>Rivaldo</name>
      <position>Midfielder</position>
      <age>30</age>
  </player>    
</players>

Each XML document should have a root element, which encompasses all other elements and unique. In the above example as we have multiple player tags, we used players tag to encompass all player tags.

XML Schema is the description of the structure of the XML document. It specifies the hierarchy of the elements and attributes of each element like name, data type etc.

By using XML Schema, it is possible to specify exactly how you want your XML document to be. It also enables data to be validated according to the data type.

What is ADO.NET?

ADO.NET is new way to access data from .NET Framework. It is mainly consisting of two sets of classes. One set is used to access SQL Server using SQL Server .NET Data Provider, which access SQL Server directly. The other provider is a bridge. It accesses an OLE DB Provider to access databases other than SQL Server.

ADO.NET provides classes like Connect, DataReader, Command and DataAdapter. But the class that we use in this article is DataSet.

DataSet contains a collection of tables in memory. DataSet also contains relation objects establishing relationship between tables in DataSet.

How to read XML Document?

As I have already mentioned, prior to ADO.NET, one has to access XML document using DOM. But with DataSet of ADO.NET, it is possible to access and manipulate an XML document.

The following example program is in VB.NET. It shows how you can read and display the content of an XML document called PLAYERS.XML.


' import required namespaces

imports system
imports system.data
imports system.xml

Module Test
  public sub main()
     
     dim ds as new DataSet
     dim r  as datarow
     dim i  as integer
 
       ' read an XML file into dataset. Data read using READXML will 
       ' be taken as a table

       ds.ReadXML("c:\srikanth\vb\players.xml")

       ' display data from the file

       for each r in  ds.tables(0).rows
         for i = 0 to  ds.tables(0).columns.count  - 1
            console.write( r.item(i) &  "  " )
         next
         console.writeline
       next
  end sub

end module
    

ReadXML method takes the name of the file that contains XML data and reads the content of the file into DataSet as a DataTable. Elements of XML document - Player - become rows in the table.

Tables collection represents tables in dataset. The expression ds.tables(0) provides access to first table in the dataset.

Columns collection of a table represents columns of the table. So the expression ds.tables(0).columns.count returns the number of columns in the first table of the dataset.

Item collection of a row provides access to values of columns in the row. You can access a column either by index of the column or the name of the column.

When you run the above program the following output will be generated.


Ronaldo  Forward  25
Roberto Carlos  Defender  29
Rivaldo  Midfielder  30

Using schema

Schema of an XML document specifies the structure to which XML document must adhere. It is used to specify the data types of elements. For example, in the above document (players.xml), age must be an integer. But as of now, it can be anything as default data type is string. So we can create and load schema so that the data is validated at the time of loading.

The following is schema that can be used to specify the structure of players.xml file. The schema itself is an XML document and stored with the name playerschema.xml.


<?xml version="1.0"?>
<xs:schema id="players" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xs:element name="players" msdata:IsDataSet="true">
    <xs:complexType>
      <xs:choice maxOccurs="unbounded">
        <xs:element name="player">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="name" type="xs:string" minOccurs="0" />
              <xs:element name="position" type="xs:string" minOccurs="0" />
              <xs:element name="age" type="xs:int" minOccurs="0" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>

The above schema specifies that the document contains an element called player and it contains three elements - name, position and age. The type of name and position is string and type of age is int.

If you load schema first and then data ReadXML methods checks whether the data in the document confirms with the structure in the schema.

The following VB.NET program loads schema from playerschema.xml first and then data from player.xml.


imports system
imports system.data
imports system.xml

Module Test
  public sub main()
     
     dim ds as new DataSet
     dim r  as datarow
     dim i  as integer
 
       ds.readxmlschema("c:\srikanth\vb\playerschema.xml")

       try
           ds.ReadXML("c:\srikanth\vb\players.xml")
       catch  ex as Exception
           console.writeline( "Error while loading data : " + ex.Message)
           exit sub
       end try

        
       ' display data from the file
       for each r in  ds.tables(0).rows
         for i = 0 to  ds.tables(0).columns.count  - 1
            console.write( r.item(i) &  "  " )
         next
         console.writeline
       next


  end sub

end module

The above program executes ReadXML method in Try block as it may throw an exception if data in the XML document is not in the required format.

If you change age in XML document to a non-numeric then it displays an error and program is terminated, otherwise the data from the document is displayed.

Adding elements to XML document using Dataset

It is possible to modify and XML document using dataset. To change data in XML document following the steps:
  1. Load XML document into a table of DataSet
  2. Make changes to table of dataset
  3. Write the content of table back to XML document on the disk

The following example loads data from players.xml into a table of dataset. Then it adds a new row to the table and write schema of the table into playerschema.xml and data into players.xml.


imports system
imports system.data
imports system.xml

Module Test
  public sub main()
     
     dim ds as new DataSet
     dim dr as datarow
 
       ds.ReadXML("c:\srikanth\vb\players.xml")

       'add one more player 

       dr = ds.tables(0).newrow()   ' add a new empty row
       dr.item(0) = "Cafu"  
       dr.item(1) = "Defender"
       dr.item(2) = 27

       ds.tables(0).rows.add(dr)   ' add new row to rows collection
        
       ds.writexml("c:\srikanth\vb\players.xml") 
       ds.writexmlschema("c:\srikanth\vb\playerschema.xml") 


  end sub

end module

Method WriteXML writes data of table into an XML document. WriteXMLSchema writes schema into the given file. Open players.xml in Internet Explorer and see the whether a new row is added to players.xml.

Manipulating XML documents doesn't need knowledge of DOM and instead ADO.NET itself can be used for that purpose.

Keep Learning

P.Srikanth