Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive
 

I take a lot of free online coding classes, mainly from Coursera and Udacity, and I’ve picked up a lot of programming tricks in other languages that are easy translated to Windows PowerShell.

In a Java class on Udacity, I learned a cool way to find duplicates in any collection. It uses the fact that the keys in hash tables must be unique. The parser throw an “Item has already been added” error if you try to add a key that’s already in the hash table.

In this example, I try to add “Day” to a hash table that already has an “Day” key. The value is arbitrary.

 

    $hash = @{ Day = "Wednesday"; Weather = "Sunny" }
    $hash.Add("Day", "Friday")

 

    ERROR: Exception calling "Add" with "2" argument(s): "Item has already been added. Key 
    in dictionary: 'Day'  Key being added: 'Day'"Test.ps1 (15): ERROR: At Line: 15 char: 1
    ERROR: + $hash.Add("Day", "Friday")
    ERROR: + ~~~~~~~~~~~~~~~~~~~~~~~~~~
    ERROR: + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    ERROR: + FullyQualifiedErrorId : ArgumentException
    ERROR:

 

To detect a duplicate in a small collection, create a hash table and add the items to the hash table as keys.

    $hash = @{ }
    "a", "b", "a", "c", "d" | ForEach { $hash.Add($_, 0) }

 

Use a Try block to add each item as a key with a value of 0. If a MethodInvocationException occurs in the Try block code, instead of erroring out and interrupting the script, it falls in the Catch block. I use the Catch block to save the duplicates in an array.

    $Items = "a", "b", "a", "c", "d"
    $hash = @{ }
    $duplicates = @()

 

        foreach ($item in $Items)
        {
            try
            {
                $hash.add($item, 0)
            }
            catch [System.Management.Automation.MethodInvocationException]
            {
                $duplicates += $item
            }
        }

 

You can return the duplicates that you saved and/or the unique items, which are the keys in the hash table.

    $hash.keys
    c
    a
    d
    b

 

For the final version of my little script, I convert the hash table to an ordered dictionary, which preserves the order in which the keys were added. I also allow users to pipe the items to the script by adding the ValueFromPipeline parameter attribute and the Process block that supports it.

<#
    .SYNOPSIS
        Gets duplicates or unique values in a collection.

    .DESCRIPTION
        The Get-Duplicates.ps1 script takes a collection and returns
        the duplicates (by default) or unique members (use the Unique
        switch parameter).

    .PARAMETER  Items
        Enter a collection of items. You can also pipe the items to
        Get-Duplicates.ps1.

    .PARAMETER  Unique
        Returns unique items instead of duplicates. By default, Get-Duplicates.ps1
        returns only duplicates.

    .EXAMPLE
        PS C:\> .\Get-Duplicates.ps1 -Items 1,2,3,2,4
        2

    .EXAMPLE
        PS C:\> 1,2,3,2,4 | .\Get-Duplicates.ps1
        2

    .EXAMPLE
        PS C:\> .\Get-Duplicates.ps1 -Items 1,2,3,2,4 -Unique
        1
        2
        3
        4

    .INPUTS
        System.Object[]

    .OUTPUTS
        System.Object[]

    .NOTES
    ===========================================================================
     Created with:     SAPIEN Technologies, Inc., PowerShell Studio 2014 v4.1.72
     Created on:       10/15/2014 9:34 AM
     Created by:       June Blender (juneb)
#>

param
(
    [Parameter(Mandatory = $true,
               ValueFromPipeline = $true)]
    [Object[]]
    $Items,
    
    [Parameter(Mandatory = $false)]
    [Switch]
    $Unique
)
Begin
{
    $hash = [ordered]@{ }
    $duplicates = @()
}
Process
{
    foreach ($item in $Items)
    {
        try
        {
            $hash.add($item, 0)
        }
        catch [System.Management.Automation.MethodInvocationException]
        {
            $duplicates += $item
        }
    }
}
End
{
    if ($unique)
    {
        return $hash.keys
        
    }
    elseif ($duplicates)
    {
        return $duplicates
    }
}

 

Remember that qualification about small collections of items? This strategy is a little programming trick that is not optimized for large data sets. For those, stick with Microsoft.PowerShell.Utility\Get-Unique and other optimized methods.

June Blender is a technology evangelist at SAPIEN Technologies, Inc. You can reach her at This email address is being protected from spambots. You need JavaScript enabled to view it. or follow her on Twitter at @juneb_get_help.

If you have questions about our products, please post in our support forum.
For licensed customers, use the forum associated with your product in our Product Support Forums for Registered Customers.
For users of trial versions, please post in our Former and Future Customers - Questions forum.
Copyright © 2024 SAPIEN Technologies, Inc.