why unix | RBL service | netrs | please | ripcalc | linescroll
hosted services

hosted services

I've recently had to take on PowerShell at work. It's a funny language and there are some real gotchas that at first glance you may not notice. My biggest gripe with the language is the false sense of normality, that once you've scratched your head for a bit you realise what a horrid mess it is under the surface.

variable scoping

There are several different scopes of access for variables in PowerShell, and it seems to continue to grow.

  1. global: as the name suggests, variables that are declared globally are global.
  2. script: as the name suggests, variables that are declared with script scope, remain in the script content.
  3. function: as the name suggests, variables that are declared within the function, remain within that scope.
  4. local: remain in this local scope...

Lets see this in practice.

#!/usr/bin/powershell

$global:a = 1;

function second {
  $a++
  write-host "Second (a): $($a)"
}

function first {
  $a++
  write-host "First (a): $($a)"
  second
}

first
write-host "main (a): $($a)"

Save the file, I named it 'scope.ps1', then run with powershell: powershell ./scope.ps1.

First (a): 2
Second (a): 3
main (a): 1

What's going on here? People familiar with most other languages would whince at output like this. We all understand the principle of auto garbage collection clearing up variables that are declared on the stack when functions begin to be consumed by the garbage collector when the function leaves the stack. So, what's going on here there?

It doesn't matter how you declare a variable. What matters is how you access it. '$global:a' Does not do what we're expecting. Lets try something else.

$global:a = 1;

function second {
  $global:a++
  write-host "Second (a): $($global:a)"
}

function first {
  $global:a++
  write-host "First (a): $($global:a)"
  second
}

first
write-host "main (a): $($global:a)"

After running we find the following output. This is what we're after.

First (a): 2
Second (a): 3
main (a): 3

It's not looking as intuitive as we'd expect though, does it. For example, here is what the equivalent PERL would look like:

#!/usr/bin/perl

use strict;
use warnings;

my $a = 1;

sub second {
  $a++;
  print "Second (a): $a\n";
}

sub first {
  $a++;
  print "First (a): $a\n";
  second();
}

first();
print "main (a): $a\n";

With the following output:

First (a): 2
Second (a): 3
main (a): 3

This took a fraction of the time to execute, a fraction of the time to write, due to less finger gymnastics to produce the $( $ ) constructs.

So, what does local do?

#!/usr/bin/powershell

$a = 1;

function second {
  $local:a++
  write-host "Second (a): $($local:a)"
}

function first {
  $local:a++
  write-host "First (a): $($local:a)"
  second
}

first
write-host "main (a): $($local:a)"

Which gives the following:

First (a): 1
Second (a): 1
main (a): 1

This I can agree with, it seems to do what I might expect at the initial function declaration level. So, what should you take away from this? If you're going to use functions in your script, always declare the variable as 'local', otherwise you might run into problems where one function starts using a variable, expecting it to initialise at zero, but use what the most recent stack copy set it to. This, to me, is unlike most other languages unless you're declaring the variable in that subroutine.

regex

Another feature about PowerShell that causes me headaches is what exactly is a double quoted string in PowerShell terms. For example, this is going to return a plain dollar sign '$'. On the other hand, this should not parse "$". When we run that, we will see that the single quoted string returns a dollar, but so does the double quoted string. This contradicts the rule that $ is a meta character that signifies the following data should be parsed.

webclient

Another point of annoyance is the webclient. Imagine this, you want to simply contact to a site, post some data and check the output for signs of an accepted payload (since people hardly use the HTTP status codes these days).

One might start with something like this:

$wc = new-object System.Net.WebClient
$a = $wc.UploadFile( 'https://site/upload/', $file )

This would look normal, so inspecting $a, we just find a stream of numbers. Well, numbers, encoded as strings.

Yes, this reminds me of PowerShell DSC OMI nonsense again, where MOF data is changed from data to ord(C) values. Nonesense.

If you want to look at the response data, using something like the following will get you that:

$str = ""
foreach( $i in $a ) {
    $str += ( "{0}" -f [char]$i )
}
write-host $str

I fail to see why the end user should have to do this. We all love perl, LWP::UserAgent does it correctly. Well, every language that I can shake an editor at does it correctly.

loops

To start, we can take a look at what normal loops and keywords normally do.

  1. return: stop processing and return to the caller
  2. break: stop current loop and continue
  3. continue: go to the next item in the loop and continue there

This works in most languages, c, perl, java, c# and powershell.

Take a look at the following and pay attention to the output, there are three items: a, b and c. At the end of the function we show that it has finished by printing 'fin.', then return to the caller and print 'done'.

function test {
  $vars = @( "a", "b", "c" );

  foreach( $_ in $vars ) {
    if( $_ -eq "b" ) {
# break
# continue
# return
    }

    write-host $_
  }
  write-host "fin."
}

test
write-host "done"

The first time we run this, with break, continue and return all commented, we see the output:

a
b
c
fin.
done

If you change 'write-host "fin."' to 'exit' then you would not see the word 'fin.' or 'done' as the script would have stopped there.

The result of uncommenting one of the break, continue and return statements lines alone can be shown with the following table:

break continue return
a a a
fin. c done
done fin.
done

Now, lets try using a powershell pipe and see if that changes anything. A powershell pipe construct can be used against a loop, so that you iterate each of the input object items.

Now the code sample looks like this:

function test {
  $vars = @( "a", "b", "c" );

  $vars | ForEach-Object {
    if( $_ -eq "b" ) {
  # break
  # continue
  # return
    }

    write-host $_
  }
  write-host "fin."
}

test
write-host "done"

If we run this in its current condition we will see exactly as we saw before, each of the array items will be printed, etc. Lets go through the table and see what we get when each of the comments is removed by itself.

break continue return
a a a
c
fin.
done

Don't ever expect powershell to do what you'd expect. Don't ever use this for real world tasks. What is absolutely bonkers is that MS is pushing this as their tool language to do system administration. If you want your systems to spontaneously break, then sure, use a language where one minute 'return' means continue, and break or continue mean exit. How it has passed any QC I have no idea, how people can work with it, I have no idea either.

MS seem somewhat in denial over this berserk behaviour:

"simply use the RETURN keyword instead.... and now behaves as expected and intended." -- technet.

false generator

powershell has syntax that makes the programmer think that the program is capable of reading pipes like a unix pipeline. However, it does not.

#!/usr/bin/pwsh

function make_list() {
    while( !$null ) {
        write-output "element"
    }
}

function read_list() {
    [cmdletbinding()]
    param(
        [parameter( mandatory=$true, valuefrompipeline=$true )]
        [string]$item
    )

    write-output "Read element: $item"
}

make_list | read_list

In this snippet the above program will not print anything at all until make_list has finished.

In python, or other unix shells the above recipe would be a force multiplier, consider a web server, it would be bad form to make a job that occupies several hundreds of megabytes before sending data. It is far better to work in smaller buffers, which allows you to have many more requests in the air, plus, the end user gets data immediately rather than having to wait for full job completion before the first byte is sent.

This simple python snippet shows how simple generator implementation can be.

#!/usr/bin/python

def make_item():
    while 1==1:
        yield "element"

def read_item():
    for item in make_item():
        print "read {0}".format( item )

read_item()

Output is immediate and memory consumption is trivial. Python doesn't have a | operator the same way that powershell does, thankfully.

headache code

If you're still not convinced, take a look at adding an extension in azure.

Why do I cite this as horrid, it's terrible to read. There's no way to make this more pleasant. What is the difference between Add-AzureAccount and Login-AzureRmAccount?

This is vendor specific code, but this particular vendor happens to be the owner of the language too.

suffers the very same ailments of windows

PowerShell suffers from a broken community and broken upstream support. Open source communities frequently take user submissions and (standards/testing satisfactory) merge and become part of a well-maintained product.

There are well understood tools in Unix that are expected to work in all environments. du is one of those tools, it prints directory and descendant sizes.

In order to do this on a MS system, out of the box someone has to write an awful lot of code to get the very same functionality. In order to support the wide range of tasks that users expect from a 'shell' MS has to become inclusive and merge user contributions into the master. This has not been merged. There are plenty of examples of code that some something similar (often lacking the -h flag to get human units).