#PSTip Count occurrences of a word using a hash table

On one of the PowerShell forums someone asked for help with getting the 15 most used words in a webpage. The core of the answer to that question is amazingly clever use of a hash table.

PS> $wordList = 'three','three','one','three','two','two'
PS> $wordStatistic = $wordList | ForEach-Object -Begin { $wordCounts=@{} } -Process { $wordCounts.$_++ } -End { $wordCounts }
PS> $wordStatistic

Name  Value
----  -----
one   1
three 3
two   2

The result correctly states that the word ‘three’ occurs in the word list three times, the ‘two’ is there two times, and the ‘one’, not surprisingly, once.

To understand how the trick works let’s go through it step by step. The first word in the $wordList array – the word ‘three’ – is passed down the pipeline. The $wordCounts hash table, created in the Begin block, is queried for key named ‘three’, in our case represented by the current object in pipeline variable $_. The value of the key value pair named ‘three’ is increased by one using the increment operator ‘++’. If the key is not present in the hash table it is automatically created. One by one the ForEach-Object loop processes all the words in the array incrementing appropriate key by one on each iteration. To complete the task you simply output each key-value pair of the $wordStatistic hash table using the GetEnumerator() method, sort them by the Value property, and select just the most used words, in our case just one.

$wordStatistic.GetEnumerator() |
Sort-Object -Property Value -Descending |
Select-Object -First 1

Name  Value

----  -----

three 3
Filed in: Columns, Tips and Tricks Tags: ,
© 1311 PowerShell Magazine. All rights reserved. XHTML / CSS Valid.
Proudly designed by Theme Junkie.
%d bloggers like this: