8

Mastering everyday XML tasks in PowerShell

xmlPowerShell has awesome XML support. It is not obvious at first, but with a little help from your friends here at PowerShellMagazine.com, you’ll soon solve every-day XML tasks – even pretty complex ones – in no time.

So let’s check out how you put very simple PowerShell code to work to get the things done that used to be so mind-blowingly complex in the pre-PowerShell era.

Let’s create an XML document from scratch, add new data sets, change pieces of information, add new data, remove data, and save an updated version of it to a new well-formed XML file.

Creating New XML Documents

Creating completely fresh XML documents from scratch used to be a tedious task. Many scripters resorted to creating XML files as a plain text. While that’s OK, it is error prone. Chances are that typos and case issues sneak in, and you may find yourself in an unfriendly world of malformed and dysfunctional XML.

No more, because there’s a buddy that can help you create XML documents: the XMLTextWriter object. It shields the complexity of dealing with the raw XML object model, and instead assists you in writing your pieces of information to an XML file.

To begin this story, let’s create a fairly complex XML document that the upcoming examples can use to play with. The goal is to create an XML document that has all the typical things in it: nodes, attributes, data sections, and comments.

# this is where the document will be saved:
$Path = "$env:temp\inventory.xml"

# get an XMLTextWriter to create the XML
$XmlWriter = New-Object System.XMl.XmlTextWriter($Path,$Null)

# choose a pretty formatting:
$xmlWriter.Formatting = 'Indented'
$xmlWriter.Indentation = 1
$XmlWriter.IndentChar = "`t"

# write the header
$xmlWriter.WriteStartDocument()

# set XSL statements
$xmlWriter.WriteProcessingInstruction("xml-stylesheet", "type='text/xsl' href='style.xsl'")

# create root element "machines" and add some attributes to it
$XmlWriter.WriteComment('List of machines')
$xmlWriter.WriteStartElement('Machines')
$XmlWriter.WriteAttributeString('current', $true)
$XmlWriter.WriteAttributeString('manager', 'Tobias')

# add a couple of random entries
for($x=1; $x -le 10; $x++)
{
    $server = 'Server{0:0000}' -f $x
    $ip = '{0}.{1}.{2}.{3}' -f  (0..256 | Get-Random -Count 4)

    $guid = [System.GUID]::NewGuid().ToString()

    # each data set is called "machine", add a random attribute to it:
    $XmlWriter.WriteComment("$x. machine details")
    $xmlWriter.WriteStartElement('Machine')
    $XmlWriter.WriteAttributeString('test', (Get-Random))

    # add three pieces of information:
    $xmlWriter.WriteElementString('Name',$server)
    $xmlWriter.WriteElementString('IP',$ip)
    $xmlWriter.WriteElementString('GUID',$guid)

    # add a node with attributes and content:
    $XmlWriter.WriteStartElement('Information')
    $XmlWriter.WriteAttributeString('info1', 'some info')
    $XmlWriter.WriteAttributeString('info2', 'more info')
    $XmlWriter.WriteRaw('RawContent')
    $xmlWriter.WriteEndElement()

    # add a node with CDATA section:
    $XmlWriter.WriteStartElement('CodeSegment')
    $XmlWriter.WriteAttributeString('info3', 'another attribute')
    $XmlWriter.WriteCData('this is untouched code and can contain special characters /\@<>')
    $xmlWriter.WriteEndElement()

    # close the "machine" node:
    $xmlWriter.WriteEndElement()
}

# close the "machines" node:
$xmlWriter.WriteEndElement()

# finalize the document:
$xmlWriter.WriteEndDocument()
$xmlWriter.Flush()
$xmlWriter.Close()

notepad $path

This script generates a fake server inventory with a lot of random information. The result is opened in notepad and will look similar to this:

<?xml version="1.0"?>
<?xml-stylesheet type='text/xsl' href='style.xsl'?>
<!--List of machines-->
<Machines current="True" manager="Tobias">
 <!--1. machine details-->
 <Machine test="578163632">
  <Name>Server0001</Name>
  <IP>31.248.95.170</IP>
  <GUID>51cb0dfb-75ed-4967-8392-47d87596c73c</GUID>
  <Information info1="some info" info2="more info">RawContent</Information>
  <CodeSegment info3="another attribute"><![CDATA[this is untouched code and can contain special characters /\@<>]]></CodeSegment>
 </Machine>
 <!--2. machine details-->
 <Machine test="124214010">
  <Name>Server0002</Name>
  <IP>33.60.233.89</IP>
  <GUID>9618b8bc-c200-46ce-b423-ee030555242d</GUID>
  <Information info1="some info" info2="more info">RawContent</Information>
  <CodeSegment info3="another attribute"><![CDATA[this is untouched code and can contain special characters /\@<>]]></CodeSegment>
 </Machine>
(...)
</Machines>

The purpose of this XML document is two-fold: it serves as an example how you can create XML files from scratch, and it serves as sample data for the following exercises.

Just assume this was an XML file with relevant information. You can apply the tactics you are about to learn to any well-formed XML file.

Attention: XMLTextWriter does a lot of magic for you, but you are responsible for creating meaningful content. One of the issues that can easily burn your feet is a malformed node name. Node names must not contain spaces.

So while “CodeSegment” is OK, “Code Segment” would not be OK. XML would try and name your node “Code”, then add an attribute named “Segment”, and finally choke on the fact that you never assigned a value to the attribute.

Finding Information in XML Files

One common task is to extract information from an XML file. Let’s assume you need a list of machines and their IP addresses. Provided you have generated the sample XML file above, then this is all it takes to create the report:

# this is where the XML sample file was saved:
$Path = "$env:temp\inventory.xml"

# load it into an XML object:
$xml = New-Object -TypeName XML
$xml.Load($Path)
# note: if your XML is malformed, you will get an exception here
# always make sure your node names do not contain spaces

# simply traverse the nodes and select the information you want:
$Xml.Machines.Machine | Select-Object -Property Name, IP

The result will look similar to this:

Name          IP
----          --
Server0001    31.248.95.170
Server0002    33.60.233.89
Server0003    226.6.1.30
Server0004    139.30.8.110
Server0005    94.104.253.8
Server0006    202.80.178.61
Server0007    22.217.227.159
Server0008    253.72.25.212
Server0009    233.147.116.60
Server0010    41.173.220.129

Note: Some of you may wonder why I used an XML object in the first place. Often you find code like this:

# this is where the xml sample file was saved:
$Path = "$env:temp\inventory.xml"

# load it into an XML object:
[XML]$xml = Get-Content $Path

The simple reason is performance. Reading in the XML file as a plain text file via Get-Content and then casting it to XML in a second step is a very expensive approach. Even though our XML file isn’t that large, the latter solution takes almost 7 times more time than the first one, and this will add up with even larger XML files.

So whenever you want to load an XML file, make sure you get an XML object and use its Load() method. This method is versatile enought by the way to also accept URLs, so you can use an URL to your favorite RSS feed as well – provided you have direct Internet access and no proxy settings to configure.

Picking Particular Instances

Let’s assume you do not want a list of all servers, but instead just want to look up the IP address and the information attribute info1 for a specific server in your list. You could use the same approach like this:

$Xml.Machines.Machine |
Where-Object { $_.Name -eq 'Server0009' } |
Select-Object -Property IP, {$_.Information.info1}

This would get you the IP address for “server0009″ plus the info1 attribute. Instead of querying all elements and then picking the one you are after on the client side, you can also use XPath, a XML query language:

$item = Select-XML -Xml $xml -XPath '//Machine[Name="Server0009"]'
$item.Node | Select-Object -Property IP, {$_.Information.Info1}

The XPath query “//Machine[Name="Server0009"]“ looks for all “Machine” nodes that have a sub-node called “Name” with a value of “Server0009″.

Important: XPath is case-sensitive, so if the node name is “Machine”, then you cannot query for “machine”.

As a side note, in both approaches you need a script block to access attributes because the attribute “info1″ is part of a sub-node “Information”. As always in these scenarios, you can use a hash table to assign a better name to that piece of information:

$info1 = @{Name='AdditionalInfo'; Expression={$_.Information.Info1}}
$item = Select-XML -Xml $xml -XPath '//Machine[Name="Server0009"]'
$item.Node | Select-Object -Property IP, $info1

The result will look similar to this:

IP              AdditionalInfo
--              --------------
97.196.140.12   some info

XPath is an extremely powerful XML query language. You can find information on its syntax all over the place in the Internet (check these links for example: http://www.w3schools.com/xpath/ and http://go.microsoft.com/fwlink/?LinkId=143609). When you read these documents, you will find that XPath can also use so-called “user-defined functions” like last() or lowercase(). These functions are not supported here.

Changing XML Content

Often, you will want to update information in an XML document. Rather than parsing the XML yourself, simply stick to the techniques you just learned.

So if you wanted to update Server0006 and assign it a new name and a different IP address, this is what you would do:

$item = Select-XML -Xml $xml -XPath '//Machine[Name="Server0006"]'
$item.node.Name = "NewServer0006"
$item.node.IP = "10.10.10.12"
$item.node.Information.Info1 = 'new attribute info'

$NewPath = "$env:temp\inventory2.xml"
$xml.Save($NewPath)
notepad $NewPath

As you can see, updating information is simple, and all changes you make are applied automatically to the underlying XML object. All you need to do is to save the changed XML object to file to make your changes permanent. The result is displayed in the Notepad editor and will look similar to this:

<!--6. machine details-->
  <Machine test="559669990">
    <Name>NewServer0006</Name>
    <IP>10.10.10.12</IP>
    <GUID>cca8df99-78e1-48e0-8c4d-193c6d4acbd2</GUID>
    <Information info1="new attribute info" info2="more info">RawContent</Information>
    <CodeSegment info3="another attribute"><![CDATA[this is untouched code and can contain special characters /\@<>]]></CodeSegment>
  </Machine>

You have just made changes to an existing XML document in no time, without tricky parsing, and without risking to break XML structure.

In the same way, you can make bulk adjustments. Let’s assume all the servers are to get brand new names. Instead of “ServerXXXX”, the machines now need to be named like “Prod_ServerXXXX”. Here’s the solution:

Foreach ($item in (Select-XML -Xml $xml -XPath '//Machine'))
{
    $item.node.Name = 'Prod_' + $item.node.Name
}

$NewPath = "$env:temp\inventory2.xml"
$xml.Save($NewPath)
notepad $NewPath

Note how all server names in the XML document have been updated. Select-XML this time won’t return just one object but many, one for each server. This is because XPath this time selects all “Machine” nodes without special filtering. That’s why all of these nodes need to be processed in a foreach loop.

Inside of the loop, the node “Name” is assigned a new value, and once all “Machine” nodes are updated, the XML document is saved and opened in Notepad.

You may argue that in this example, prepending the server name with “Prod_” is really a trivial change, and that is true. There may be more complex requirements. However, the focus here is to show how you fundamentally change XML data, not how you do sophisticated string operations.

Still, if you ask yourself how you would, for example, replace “ServerXXXX” with “PCXX” (including turning a 4-digit number into a 2-digit number, so this definitely is not a trivial change), here is a solution as well:

foreach($item in (Select-XML -Xml $xml -XPath '//Machine'))
{
    if ($item.node.Name -match 'Server(\d{4})')
    {
      $item.node.Name = 'PC{0:00}' -f [Int]$matches[1]
    }
}
$NewPath = "$env:temp\inventory2.xml"
$xml.Save($NewPath)
notepad $NewPath

This time, a regular expression extracts the numeric part of the original server name, then the -f operator reformats the number and adds it to the new server prefix.

Neither regular expressions nor number formatting are in the focus of this article. The important part is to see that you are free to use whatever technique you like to construct the new server name. At the end of the day, changing the XML content always sticks to the same rules, though.

Adding New Data

Occasionally, updating data is not enough. You may want to add a new computer to the list. Again, this is straightforward. You simply pick an existing node, clone it, then update its content and append it to the parent of your liking. This way, you do not have to create the complex node structure yourself and can be certain that the new node is structured just like any of the existing nodes.

This will add a new machine to the list of machines:

# clone an existing node structure
$item = Select-XML -Xml $xml -XPath '//Machine[1]'
$newnode = $item.Node.CloneNode($true)

# update the information as needed
# all other information is defaulted to the values from the original node
$newnode.Name = 'NewServer'
$newnode.IP = '1.2.3.4'

# get the node you want the new node to be appended to:
$machines = Select-XML -Xml $xml -XPath '//Machines'
$machines.Node.AppendChild($newnode)

$NewPath = "$env:temp\inventory2.xml"
$xml.Save($NewPath)
notepad $NewPath

Since the node you are adding is cloned from an existing node, all information in this new node is copied from the existing node. Information that you do not update will keep the old values.

And what if you wanted to add the new node to the top of the list? Simply use InsertBefore() instead of AppendChild():

# add it to the top of the list:
$machines.Node.InsertBefore($newnode, $item.node)

Likewise, you can basically insert the new node anywhere. This would insert it right after Server0007:

# add it after "Server0007":
$parent = Select-XML -Xml $xml -XPath '//Machine[Name="Server0007"]'
$machines.Node.InsertAfter($newnode, $parent.node)

Removing XML Content

Deleting data entirely from your XML file is just as easy. If you wanted to remove Server0007 from your list, here’s how:

# remove "Server0007":
$item = Select-XML -Xml $xml -XPath '//Machine[Name="Server0007"]'
$null = $item.Node.ParentNode.RemoveChild($item.node)

Enormous Power at Your Fingertips

With the examples presented, you can now manage the most commonly needed XML manipulations in just a couple of lines of code. It is well worth investing some time into improving your XML and XPath proficiency – you can do amazing things with them.

And for those of you that have sticked with me this long, I have a little present for you: a great little tool I use very often that can be very helpful for you, too, I am sure. It uses the exact same tactics you just heard about. Here’s the story:

ConvertTo-XML can convert any object into XML, and since XML is a hierarchical data format, preserving structure up to a given depth, it is an excellent way of examining nested object properties. So you can “unfold” an object structure and look at all of its properties, even the deeply nested ones.

Without XML and XPath, all you could do is look at plain XML and search for information yourself. For example, if you wanted to find out where exactly the $host object stores PowerShell’s color information, you could do this (which might be not such a good idea after all because you get flooded with raw XML information):

$host | ConvertTo-XML -Depth 5 | Select-Object -ExpandProperty outerXML

With the knowledge just presented, you could now take the raw XML and extract and filter the object properties.

So here’s the promised function called Get-ObjectProperty which works a little bit like Get-Member on steroids. It can tell you which property inside an object holds the value you are after. Have a look:

PS> $host | Get-ObjectProperty -Depth 2 -Name *color*

Name                    Value                   Path                    Type
----                    -----                   ----                    ----
TokenColors                                     $obj1.PrivateData.To... Microsoft.PowerShel...
ConsoleTokenColors                              $obj1.PrivateData.Co... Microsoft.PowerShel...
XmlTokenColors                                  $obj1.PrivateData.Xm... Microsoft.PowerShel...
ErrorForegroundColor    #FFFF0000               $obj1.PrivateData.Er... System.Windows.Medi...
ErrorBackgroundColor    #FFFFFFFF               $obj1.PrivateData.Er... System.Windows.Medi...
WarningForegroundColor  #FFFF8C00               $obj1.PrivateData.Wa... System.Windows.Medi...
WarningBackgroundColor  #00FFFFFF               $obj1.PrivateData.Wa... System.Windows.Medi...
VerboseForegroundColor  #FF00FFFF               $obj1.PrivateData.Ve... System.Windows.Medi...
VerboseBackgroundColor  #00FFFFFF               $obj1.PrivateData.Ve... System.Windows.Medi...
DebugForegroundColor    #FF00FFFF               $obj1.PrivateData.De... System.Windows.Medi...
DebugBackgroundColor    #00FFFFFF               $obj1.PrivateData.De... System.Windows.Medi...
ConsolePaneBackgroun... #FF012456               $obj1.PrivateData.Co... System.Windows.Medi...
ConsolePaneTextBackg... #FF012456               $obj1.PrivateData.Co... System.Windows.Medi...
ConsolePaneForegroun... #FFF5F5F5               $obj1.PrivateData.Co... System.Windows.Medi...
ScriptPaneBackground... #FFFFFFFF               $obj1.PrivateData.Sc... System.Windows.Medi...
ScriptPaneForeground... #FF000000               $obj1.PrivateData.Sc... System.Windows.Medi...

This will return all nested properties inside of $host that have “Color” in its name. Console output most likely is truncated, so you are better off displaying the information in a grid view window:

$host | Get-ObjectProperty -Depth 2 -Name *color* | Out-GridView

Note the column “Path”: this property specifies exactly how you would access a given nested property. In the example, Get-ObjectProperty walks two levels deep inside the object hierarchy. Greater depths will unfold even more information but will also pollute the results with more irrelevant noise information.

While you can pipe in multiple objects, it is best to pipe only one object due to the large amount of resulting data. This line would list all nested properties in a process object, five levels deep, that have a numeric value:

PS> Get-Process -id $pid | Get-ObjectProperty -Depth 5 -IsNumeric

Name                    Value                   Path                    Type
----                    -----                   ----                    ----
Handles                 684                     $obj1.Handles           System.Int32
VM                      1010708480              $obj1.VM                System.Int32
WS                      291446784               $obj1.WS                System.Int32
PM                      251645952               $obj1.PM                System.Int32
NPM                     71468                   $obj1.NPM               System.Int32
CPU                     161,0398323             $obj1.CPU               System.Double
BasePriority            8                       $obj1.BasePriority      System.Int32
HandleCount             684                     $obj1.HandleCount       System.Int32
Id                      4560                    $obj1.Id                System.Int32
Size                    264                     $obj1.MainModule.Size   System.Int32
ModuleMemorySize        270336                  $obj1.MainModule.Mod... System.Int32
FileBuildPart           9421                    $obj1.MainModule.Fil... System.Int32
FileMajorPart           6                       $obj1.MainModule.Fil... System.Int32
FileMinorPart           3                       $obj1.MainModule.Fil... System.Int32
ProductBuildPart        9421                    $obj1.MainModule.Fil... System.Int32
ProductMajorPart        6                       $obj1.MainModule.Fil... System.Int32
ProductMinorPart        3                       $obj1.MainModule.Fil... System.Int32
Size                    264                     $obj1.Modules[0].Size   System.Int32
ModuleMemorySize        270336                  $obj1.Modules[0].Mod... System.Int32
(...)

And this line would return all nested properties of the spooler service object that is of type “String”:

PS> Get-Service -Name spooler | Get-ObjectProperty -Type System.String

Name                    Value                   Path                    Type
----                    -----                   ----                    ----
Name                    spooler                 $obj1.Name              System.String
Name                    RPCSS                   $obj1.RequiredServic... System.String
Name                    DcomLaunch              $obj1.RequiredServic... System.String
DisplayName             DCOM Server Process ... $obj1.RequiredServic... System.String
MachineName             .                       $obj1.RequiredServic... System.String
ServiceName             DcomLaunch              $obj1.RequiredServic... System.String
Name                    RpcEptMapper            $obj1.RequiredServic... System.String
DisplayName             RPC Endpoint Mapper     $obj1.RequiredServic... System.String
(...)

And here’s the source code for Get-ObjectProperty. It is slightly more complex than just a couple of lines but still amazingly short, given the job it does for you.

It utilizes the exact same techniques that were just explained, so once you feel comfortable with the simple examples above, you can try and digest this one as well – or simply use it as a tool and not worry about its XML magic:

Function Get-ObjectProperty
{
  param
  (
    $Name = '*',
    $Value = '*',
    $Type = '*',
    [Switch]$IsNumeric,

    [Parameter(Mandatory=$true,ValueFromPipeline=$true)]
    [Object[]]$InputObject,

    $Depth = 4,
    $Prefix = '$obj'
  )

  Begin
  {
    $x = 0
    Function Get-Property
    {
      param
      (
        $Node,
        [String[]]$Prefix
      )

      $Value = @{Name='Value'; Expression={$_.'#text' }}
      Select-Xml -Xml $Node -XPath 'Property' | ForEach-Object {$i=0} {
        $rv = $_.Node | Select-Object -Property Name, $Value, Path, Type
        $isCollection = $rv.Name -eq 'Property'

        if ($isCollection)
        {
          $CollectionItem = "[$i]"
          $i++
          $rv.Path = (($Prefix) -join '.') + $CollectionItem
        }
        else
        {
          $rv.Path = ($Prefix + $rv.Name) -join '.'
        }

        $rv

        if (Select-Xml -Xml $_.Node -XPath 'Property')
        {
          if ($isCollection)
          {
            $PrefixNew = $Prefix.Clone()
            $PrefixNew[-1] += $CollectionItem
            Get-Property -Node $_.Node -Prefix ($PrefixNew )
          }
          else
          {
            Get-Property -Node $_.Node -Prefix ($Prefix + $_.Node.Name )
          }
        }
      }
    }
  }

  Process
  {
    $x++
    $InputObject |
    ConvertTo-Xml -Depth $Depth |
    ForEach-Object { $_.Objects } |
    ForEach-Object { Get-Property $_.Object -Prefix $Prefix$x  } |
    Where-Object { $_.Name -like "$Name" } |
    Where-Object { $_.Value -like $Value } |
    Where-Object { $_.Type -like $Type } |
    Where-Object { $IsNumeric.IsPresent -eq $false -or $_.Value -as [Double] }
  }
}
Filed in: Articles, Online Only Tags: ,

4 Pingbacks/Trackbacks

  • rudyschockaert

    Hi Tobias,
    I wonder if you have seen Joel Bennett’s XML DSL : http://poshcode.org/3926.

    # An example:
    [XNamespace]$atom=”http`://www.w3.org/2005/Atom”
    [XNamespace]$dc = “http`://purl.org/dc/elements/1.1″

    New-XDocument ($atom + “feed”) -Encoding “UTF-16″ -$([XNamespace]::Xml +’lang’) “en-US” -dc $dc {
    title {“Test First Entry”}
    link {“http`://HuddledMasses.org”}
    updated {(Get-Date -f u) -replace ” “,”T”}
    author {
    name {“Joel Bennett”}
    uri {“http`://HuddledMasses.org”}
    }
    id {“http`://huddledmasses.org/” }
    entry {
    title {“Test First Entry”}
    link {“http`://HuddledMasses.org/new-site-new-layout-lost-posts/” }
    id {“http`://huddledmasses.org/new-site-new-layout-lost-posts/” }
    updated {(Get-Date 10/31/2003 -f u) -replace ” “,”T”}
    summary {“Ema Lazarus’ Poem”}
    link -rel license -href “http`://creativecommons.org/licenses/by/3.0/” -title “CC By-Attribution”
    dc:rights { “Copyright 2009, Some rights reserved (licensed under the Creative Commons Attribution 3.0 Unported license)” }
    category -scheme “http`://huddledmasses.org/tag/” -term “huddled-masses”
    }
    } | % { $_.Declaration.ToString(); $_.ToString() }

    # results in

    Test First Entry
    http ://HuddledMasses.org
    2009-07-29T17:25:49Z

    Joel Bennett
    http ://HuddledMasses.org

    http ://huddledmasses.org/

    Test First Entry
    http ://HuddledMasses.org/new-site-new-layout-lost-posts/
    http ://huddledmasses.org/new-site-new-layout-lost-posts/
    2003-10-31T00:00:00Z
    Ema Lazarus’ Poem

    Copyright 2009, Some rights reserved (licensed under the Creative Commons Attribution 3.0 Unported license)

    • Tobias Weltner

      Thanks for sharing! We focused on built-in XML support without the need for 3rd party extensions.
      If you need to create XML documents frequently, I am sure extensions like the one you mention are really helpful.

  • Pingback: Friday Five-August 23, 2013 - The Microsoft MVP Award Program Blog - Site Home - MSDN Blogs

  • Chris

    Very nice ! i’ll use it tomorow at work !!! just wanted to know, how you’ll do to add a server only if it’s not already existing in the xml ..else you modify the informations .. i mean, i see some way.. but what is for you the best way ?

    best regards
    chris

  • Roy

    What a great article – thank you!

  • Pingback: PowerShell Magazine » Mastering everyday XML tasks in PowerShell | Soyka's Blog

  • Pingback: XenApp 6.5 Application Reporting (Part 2) | David R. Figueroa II

  • Pingback: Powershell | Pearltrees

© 2014 PowerShell Magazine. All rights reserved. XHTML / CSS Valid.
Proudly designed by Theme Junkie.
%d bloggers like this: