2011-03-20

PowerShell: the case of the rowless DataReader

I'm taking a page out of Mark Russinovich's blog with the Holmesian (or is that Doylist?) title. My sleuthing is slightly less interesting than his, but anyway.

PowerShell is a terrifically flexible tool for getting work done. Unfortunately, this flexibility comes at a price: it's not always obvious what it's doing. This is a problem of scripting languages in general, and PowerShell is no exception. Consider the following code:

function _using($o, [scriptblock] $t) {
    try {
        &$t
    } finally {
        if ($o -is [idisposable]) { $o.dispose() }
    }
}

This is obviously an attempt to emulate the "using" statement from languages like C# and VB.NET. You can rightfully argue that scripts don't really have a need for it (because scripts are short-running, and resources will be cleaned up when PowerShell exits), but let's leave that argument aside for a moment.

This function has a problem, and a rather serious one at that, but it's not at all obvious what it is. Here's some code to demonstrate the issue (which is pared down to bare essentials to illustrate the problem only):

function Get-SqlData {
    $connection = New-Object ([Data.SqlClient.SqlConnection]) "data source=.\sqlexpress;integrated security=sspi"
    $connection.Open()
    $command = New-Object ([Data.SqlClient.SqlCommand])
    $command.Connection = $connection
    $command.CommandText = "select 1 as a"
    $command.CommandType = "Text"
    $datareader = $command.ExecuteReader()
    _using ($datareader) {
        $datareader.Read()
    }
}

In order to succesfully run this code, you need a reachable SQL Server. In this case, I'm using a local SQL Express instance. It doesn't matter what's on the server, as we don't access anything.

What should this code output? It should print the result of the call to .Read(). Since we have one row, you would expect it to output "True". However, it outputs "False", as if no rows are available.

Getting the function to print "True" is easy, and requires only a very slight change:
    _using ($null) {
        $datareader.Read()
    }

If we do not pass the $datareader variable, everything is fine (except, of course, that it doesn't get disposed).

We can observe what's going wrong, but the exact point where things are going wrong is invisible to us. It's the result of passing a parameter, which is something internal to the host. So, let's crack open the host and look inside!

I'll be assuming you're familiar with WinDbg and debugging managed code. If not, well, I'd love to point you to a good, complete tutorial, but I'm not familiar with one and a cursory search turned up nothing. I do have a very brief overview of Windows debuggers that includes some helpful links, but beyond that you'll just have to sit back and enjoy the ride.

We hook up WinDbg to PowerShell. Unfortunately, calling Debugger.Break() from a PowerShell-script has no effect — there is probably a good reason why and I even vaguely recall having looked into it once for maanged applications in general, but I'm too lazy to look into it now. Never mind, there's more than one way to skin a cat.
In WinDbg, we enter "sxe out:reader" to instruct the debugger to break if the program writes debug output that contains "reader". Now we can use Debugger.Log() to trigger breaks:
    $datareader = $command.ExecuteReader()

    [diagnostics.debugger]::log(0, "reader", "")
    _using ($datareader) {
        [diagnostics.debugger]::log(0, "reader", "")
        $datareader.Read()
    }

Running the code now will break right after the DataReader is created. We can use "!dumpstackobjects" to locate it:
OS Thread Id: 0x11f8 (7)
RSP/REG          Object           Name
r14              000000000302c7a0 System.Management.Automation.MethodInformation
000000001d58d048 00000000034eec20 System.Data.SqlClient.SqlDataReader
000000001d58d080 000000000325e180 System.Data.SqlClient.TdsParserStateObject
000000001d58d098 00000000034eec20 System.Data.SqlClient.SqlDataReader
000000001d58d0a0 00000000034eef10 System.Object[]    (System.Data.SqlClient.SqlBuffer[])
000000001d58d0b0 00000000032613d0 System.Byte[]    (System.Byte[])
...

Dumping the data reader instance with "!dumpobj  00000000034eec20" gives us:

Name: System.Data.SqlClient.SqlDataReader
MethodTable: 000007fef596df30
EEClass: 000007fef57d65c8
Size: 208(0xd0) bytes
 (C:\Windows\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fef87473f8  400018a        8        System.Object  0 instance 0000000000000000 __identity
000007fef59740a8  40017e0       10 ...lClient.TdsParser  0 instance 000000000325e070 _parser
000007fef59746a0  40017e1       18 ...ParserStateObject  0 instance 000000000325e180 _stateObj
000007fef596d180  40017e2       20 ...Client.SqlCommand  0 instance 00000000034ed000 _command
000007fef596cad0  40017e3       28 ...ent.SqlConnection  0 instance 00000000034e64c0 _connection
000007fef874ed78  40017e4       98         System.Int32  1 instance             1033 _defaultLCID
000007fef8746cd8  40017e5       bc       System.Boolean  1 instance                0 _dataReady
000007fef8746cd8  40017e6       bd       System.Boolean  1 instance                0 _haltRead
000007fef8746cd8  40017e7       be       System.Boolean  1 instance                1 _metaDataConsumed
000007fef8746cd8  40017e8       bf       System.Boolean  1 instance                0 _browseModeInfoConsumed
000007fef8746cd8  40017e9       c0       System.Boolean  1 instance                0 _isClosed
000007fef8746cd8  40017ea       c1       System.Boolean  1 instance                1 _isInitialized
000007fef8746cd8  40017eb       c2       System.Boolean  1 instance                1 _hasRows
000007fef5e63490  40017ec       9c         System.Int32  1 instance                0 _altRowStatus
000007fef874ed78  40017ed       a0         System.Int32  1 instance               -1 _recordsAffected
000007fef874ed78  40017ee       a4         System.Int32  1 instance               30 _timeoutSeconds
000007fef5975698  40017ef       a8         System.Int32  1 instance             2008 _typeSystem
000007fef5973528  40017f0       30 ...ent.SqlStatistics  0 instance 0000000000000000 _statistics
000007fef87358f8  40017f1       38      System.Object[]  0 instance 00000000034eef10 _data
000007fef5e96628  40017f2       40 ...t.SqlStreamingXml  0 instance 0000000000000000 _streamingXml
000007fef5975af0  40017f3       48 ...t._SqlMetaDataSet  0 instance 00000000034eedb8 _metaData
000007fef5e94f10  40017f4       50 ...DataSetCollection  0 instance 0000000000000000 _altMetaDataSetCollection
000007fef5975c88  40017f5       58 ...e.FieldNameLookup  0 instance 0000000000000000 _fieldNameLookup
000007fef59760b8  40017f6       ac         System.Int32  1 instance                0 _commandBehavior
000007fef874ed78  40017f8       b0         System.Int32  1 instance               29 ObjectID
000007fef5e59968  40017f9       60 ...tiPartTableName[]  0 instance 0000000000000000 _tableNames
000007fef8747b08  40017fa       68        System.String  0 instance 0000000000000000 _resetOptionsString
000007fef874ed78  40017fb       b4         System.Int32  1 instance                0 _nextColumnDataToRead
000007fef874ed78  40017fc       b8         System.Int32  1 instance                0 _nextColumnHeaderToRead
000007fef874b028  40017fd       80         System.Int64  1 instance 0 _columnDataBytesRead
000007fef874b028  40017fe       88         System.Int64  1 instance 0 _columnDataBytesRemaining
000007fef874b028  40017ff       90         System.Int64  1 instance 0 _columnDataCharsRead
000007fef8749400  4001800       70        System.Char[]  0 instance 0000000000000000 _columnDataChars
000007fef8747dd0  4001801       78     System.Exception  0 instance 0000000000000000 _rowException
000007fef874ed78  40017f7      9c0         System.Int32  1   shared           static _objectTypeCount
                                 >> Domain:Value  00000000002a8400:NotInit  <<

Ignore all this for now, as we're not really interested in the details.We'll let the code proceed and hit the second breakpoint, after the DataReader has been passed to _using.

If we dump the object again, we can see the difference in state (and hopefully a clue as to why .Read() will not work):

Name: System.Data.SqlClient.SqlDataReader
MethodTable: 000007fef596df30
EEClass: 000007fef57d65c8
Size: 208(0xd0) bytes
 (C:\Windows\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fef87473f8  400018a        8        System.Object  0 instance 0000000000000000 __identity
000007fef59740a8  40017e0       10 ...lClient.TdsParser  0 instance 0000000000000000 _parser
000007fef59746a0  40017e1       18 ...ParserStateObject  0 instance 000000000325e180 _stateObj
000007fef596d180  40017e2       20 ...Client.SqlCommand  0 instance 00000000034ed000 _command
000007fef596cad0  40017e3       28 ...ent.SqlConnection  0 instance 00000000034e64c0 _connection
000007fef874ed78  40017e4       98         System.Int32  1 instance             1033 _defaultLCID
000007fef8746cd8  40017e5       bc       System.Boolean  1 instance                0 _dataReady
000007fef8746cd8  40017e6       bd       System.Boolean  1 instance                0 _haltRead
000007fef8746cd8  40017e7       be       System.Boolean  1 instance                1 _metaDataConsumed
000007fef8746cd8  40017e8       bf       System.Boolean  1 instance                0 _browseModeInfoConsumed
000007fef8746cd8  40017e9       c0       System.Boolean  1 instance                0 _isClosed
000007fef8746cd8  40017ea       c1       System.Boolean  1 instance                1 _isInitialized
000007fef8746cd8  40017eb       c2       System.Boolean  1 instance                1 _hasRows
000007fef5e63490  40017ec       9c         System.Int32  1 instance                0 _altRowStatus
000007fef874ed78  40017ed       a0         System.Int32  1 instance               -1 _recordsAffected
000007fef874ed78  40017ee       a4         System.Int32  1 instance               30 _timeoutSeconds
000007fef5975698  40017ef       a8         System.Int32  1 instance             2008 _typeSystem
000007fef5973528  40017f0       30 ...ent.SqlStatistics  0 instance 0000000000000000 _statistics
000007fef87358f8  40017f1       38      System.Object[]  0 instance 00000000034eef10 _data
000007fef5e96628  40017f2       40 ...t.SqlStreamingXml  0 instance 0000000000000000 _streamingXml
000007fef5975af0  40017f3       48 ...t._SqlMetaDataSet  0 instance 00000000034eedb8 _metaData
000007fef5e94f10  40017f4       50 ...DataSetCollection  0 instance 0000000000000000 _altMetaDataSetCollection
000007fef5975c88  40017f5       58 ...e.FieldNameLookup  0 instance 0000000000000000 _fieldNameLookup
000007fef59760b8  40017f6       ac         System.Int32  1 instance                0 _commandBehavior
000007fef874ed78  40017f8       b0         System.Int32  1 instance               29 ObjectID
000007fef5e59968  40017f9       60 ...tiPartTableName[]  0 instance 0000000000000000 _tableNames
000007fef8747b08  40017fa       68        System.String  0 instance 0000000000000000 _resetOptionsString
000007fef874ed78  40017fb       b4         System.Int32  1 instance                0 _nextColumnDataToRead
000007fef874ed78  40017fc       b8         System.Int32  1 instance                0 _nextColumnHeaderToRead
000007fef874b028  40017fd       80         System.Int64  1 instance 0 _columnDataBytesRead
000007fef874b028  40017fe       88         System.Int64  1 instance -1 _columnDataBytesRemaining
000007fef874b028  40017ff       90         System.Int64  1 instance 0 _columnDataCharsRead
000007fef8749400  4001800       70        System.Char[]  0 instance 0000000000000000 _columnDataChars
000007fef8747dd0  4001801       78     System.Exception  0 instance 0000000000000000 _rowException
000007fef874ed78  40017f7      9c0         System.Int32  1   shared           static _objectTypeCount
                                 >> Domain:Value  00000000002a8400:NotInit  <<

Scanning from top to bottom, the thing that's immediately striking is that the "_parser" field is now null. The DataReader object has definitely changed as a result of calling our function, even though it hasn't even done anything with the object at this point — or so it might appear.

Now that we have a foot in the door, we can watch for changes to the _parser field to catch the change in the act. To that end, we run the code again, have it hit the breakpoint again and dump the stack again:

OS Thread Id: 0xe0c (7)
RSP/REG          Object           Name
r14              000000000302c7a0 System.Management.Automation.MethodInformation
000000001d66c7d8 00000000032dfad0 System.Data.SqlClient.SqlDataReader
...

Let's set a processor breakpoint on the field:

ba r 8 00000000032dfad0+10

This sets a breakpoint, the size of a reference (8 bytes, since we're running on 64-bit) on the location of the _parser field (which has field offset 10, as evidenced in the above dump). As soon as anything accesses this field, we'll know.

Continuing will hit the breakpoint, at a very telling location (!CLRStack):

OS Thread Id: 0xe0c (7)
Child-SP         RetAddr          Call Site
000000001d66c870 000007fef592977b System.Data.SqlClient.SqlDataReader.ReadInternal(Boolean)
000000001d66c930 000007fee83b1f93 System.Data.Common.DbEnumerator.MoveNext()
000000001d66c9a0 000007fee83afc4c System.Management.Automation.ParserOps.MoveNext(System.Management.Automation.ExecutionContext, System.Management.Automation.Token, System.Collections.IEnumerator)
000000001d66ca30 000007fee83af323 System.Management.Automation.ParameterBinderBase.DecodeValue(System.Object, System.Type, System.Management.Automation.ParameterCollectionTypeInformation, Boolean)
000000001d66caf0 000007fee8832b92 System.Management.Automation.ParameterBinderBase.BindParameter(System.Management.Automation.CommandParameterInternal, System.Management.Automation.CompiledCommandParameter, System.Management.Automation.ParameterBindingFlags)
000000001d66ccf0 000007fee83ae99e System.Management.Automation.ParameterBinderController.BindParameter(UInt32, System.Management.Automation.CommandParameterInternal, System.Management.Automation.MergedCompiledCommandParameter, System.Management.Automation.ParameterBindingFlags)
...

Using  Reflector (or, since that's no longer free, ILSpy) to pull apart System.Management.Automation.ParameterBinderBase.DecodeValue gives us the full story.

What's happening here is that PowerShell sees that the DataReader is an enumerable object, so it happily goes ahead and enumerates it to see if it can decompose it into a collection of tokens. PowerShell sees that the DataReader does not produce tokens, so it discards the enumeration and passes the DataReader on as a single object, but the damage is already done. Enumerating a DataReader is something you can only do once, and it consumes the data.
 
Tokens are something internal to PowerShell, so this is arguably a bug — it's not the user's fault if the PowerShell team couldn't figure out a way of passing around token enumerations without invoking side effects on things that have nothing to do with tokens. This is what I meant when I said flexibility comes at a price: with all the ways PowerShell has for handling enumerations, passing along parameters and doing conversions, something's bound to break, or at least act in a non-intuitive fashion.

The problem is quite fundamental, and not easily resolved without resorting to trickery. By far the simplest way around this is to abandon the idea of _using in the first place. Like I said at the beginning, scripts are not long-lived objects. They shouldn't need disposal mechanisms anyway. If you do need to dispose the occasional expensive object in the middle of a script, you can just insert an explicit call. Quick and practical — your boss will love you.

The second-simplest way is to abandon DataReader. Using DataAdapter and DataTable avoids these problems because in-memory collections can be enumerated as many times as you like with no adverse effects. These classes do consume more memory and processing cannot begin until all data has been read, which could be a problem with big data processing tasks.

Suppose you want to stick with DataReader, or you have another scenario with an IEnumerable-implementing object that you need to pass without invoking the enumeration. There is a workaround: pass the object as a single-element array. This trick is quite common in PowerShell scripting since it's particularly easy to do (put a single comma in front of the expression to turn it into an array) and there are plenty of scenarios for which PowerShell's unwrapping behavior is overly eager.

Obviously, this means you need to change your function to handle single-element arrays. It also places a burden on the caller, who must know to do this for IEnumerable-implementers, but we can check for this in the function itself so it doesn't get overlooked:

    _using (,$datareader) {
        $datareader.Read()
    }

...

function _using($o, [scriptblock] $t) {
    if ($o -is [collections.ienumerable] -and -not ($o -is [array])) {
        throw "Wrap IEnumerables in an array before passing them."
    }
    try {
        &$t
    } finally {
        $o|%{if ($_ -is [idisposable]) { $_.dispose() }}
    }
}


Obviously this is crude: there are IEnumerable-implementers that are not arrays that could be handled correctly with no further tricks, but we take no risks and demand that everything that's not an array be protected.

If you try it out (by removing the comma), you'll notice that the error message isn't really helpful because the location of the error is the function, rather than the caller. There are ways around this too, but maybe that's a topic for another post. As an aside, PowerShell has structured ways of parameter validation (see about_Functions_Advanced_Parameters) but these will do us no good here since they involve binding and are thus subject to the same problem with enumeration.

For now, I just wanted to demonstrate a common PowerShell-pitfall manifesting in a not-so-common way and an even less common way of determining what the problem is. Happy scripting. :-)

3 comments:

Johannes Hansen said...

Really nice post. Definately saved me some time. :)

Rodrigo said...

This is very nice useful post. Thanks to detailed info.

Anonymous said...

Awesome analysis.