PowerShell is a terrifically flexible tool for getting work done. Unfortunately, this flexibility comes at a price: it's not always obvious what it's doing. This is a problem of scripting languages in general, and PowerShell is no exception. Consider the following code:
function _using($o, [scriptblock] $t) {
try {
&$t
} finally {
if ($o -is [idisposable]) { $o.dispose() }
}
}
This is obviously an attempt to emulate the "using" statement from languages like C# and VB.NET. You can rightfully argue that scripts don't really have a need for it (because scripts are short-running, and resources will be cleaned up when PowerShell exits), but let's leave that argument aside for a moment.
This function has a problem, and a rather serious one at that, but it's not at all obvious what it is. Here's some code to demonstrate the issue (which is pared down to bare essentials to illustrate the problem only):
function Get-SqlData {
$connection = New-Object ([Data.SqlClient.SqlConnection]) "data source=.\sqlexpress;integrated security=sspi"
$connection.Open()
$command = New-Object ([Data.SqlClient.SqlCommand])
$command.Connection = $connection
$command.CommandText = "select 1 as a"
$command.CommandType = "Text"
$datareader = $command.ExecuteReader()
_using ($datareader) {
$datareader.Read()
}
}
What should this code output? It should print the result of the call to .Read(). Since we have one row, you would expect it to output "True". However, it outputs "False", as if no rows are available.
Getting the function to print "True" is easy, and requires only a very slight change:
_using ($null) {
$datareader.Read()
}
If we do not pass the $datareader variable, everything is fine (except, of course, that it doesn't get disposed).
If we do not pass the $datareader variable, everything is fine (except, of course, that it doesn't get disposed).
We can observe what's going wrong, but the exact point where things are going wrong is invisible to us. It's the result of passing a parameter, which is something internal to the host. So, let's crack open the host and look inside!
I'll be assuming you're familiar with WinDbg and debugging managed code. If not, well, I'd love to point you to a good, complete tutorial, but I'm not familiar with one and a cursory search turned up nothing. I do have a very brief overview of Windows debuggers that includes some helpful links, but beyond that you'll just have to sit back and enjoy the ride.
We hook up WinDbg to PowerShell. Unfortunately, calling Debugger.Break() from a PowerShell-script has no effect — there is probably a good reason why and I even vaguely recall having looked into it once for maanged applications in general, but I'm too lazy to look into it now. Never mind, there's more than one way to skin a cat.
In WinDbg, we enter "sxe out:reader" to instruct the debugger to break if the program writes debug output that contains "reader". Now we can use Debugger.Log() to trigger breaks:
$datareader = $command.ExecuteReader()
[diagnostics.debugger]::log(0, "reader", "")
_using ($datareader) {
[diagnostics.debugger]::log(0, "reader", "")
$datareader.Read()
}
OS Thread Id:
0x11f8 (7)
RSP/REG Object Name
r14 000000000302c7a0
System.Management.Automation.MethodInformation
000000001d58d048
00000000034eec20 System.Data.SqlClient.SqlDataReader
000000001d58d080
000000000325e180 System.Data.SqlClient.TdsParserStateObject
000000001d58d098
00000000034eec20 System.Data.SqlClient.SqlDataReader
000000001d58d0a0
00000000034eef10 System.Object[]
(System.Data.SqlClient.SqlBuffer[])
000000001d58d0b0 00000000032613d0 System.Byte[] (System.Byte[])
Dumping the data reader instance with "!dumpobj 00000000034eec20" gives us:
Name:
System.Data.SqlClient.SqlDataReader
MethodTable:
000007fef596df30
EEClass:
000007fef57d65c8
Size: 208(0xd0)
bytes
(C:\Windows\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
MT Field
Offset Type
VT Attr Value Name
000007fef87473f8 400018a
8 System.Object 0 instance 0000000000000000 __identity
000007fef59740a8 40017e0
10 ...lClient.TdsParser 0 instance
000000000325e070 _parser
000007fef59746a0 40017e1
18 ...ParserStateObject 0
instance 000000000325e180 _stateObj
000007fef596d180 40017e2
20 ...Client.SqlCommand 0
instance 00000000034ed000 _command
000007fef596cad0 40017e3
28 ...ent.SqlConnection 0
instance 00000000034e64c0 _connection
000007fef874ed78 40017e4
98 System.Int32 1 instance
1033 _defaultLCID
000007fef8746cd8 40017e5
bc System.Boolean 1 instance
0 _dataReady
000007fef8746cd8 40017e6
bd System.Boolean 1 instance 0 _haltRead
000007fef8746cd8 40017e7
be System.Boolean 1 instance
1 _metaDataConsumed
000007fef8746cd8 40017e8
bf System.Boolean 1 instance 0 _browseModeInfoConsumed
000007fef8746cd8 40017e9
c0 System.Boolean 1 instance
0 _isClosed
000007fef8746cd8 40017ea
c1 System.Boolean 1 instance
1 _isInitialized
000007fef8746cd8 40017eb
c2 System.Boolean 1 instance 1 _hasRows
000007fef5e63490 40017ec
9c System.Int32 1 instance
0 _altRowStatus
000007fef874ed78 40017ed
a0 System.Int32 1 instance
-1 _recordsAffected
000007fef874ed78 40017ee
a4 System.Int32 1 instance
30 _timeoutSeconds
000007fef5975698 40017ef
a8 System.Int32 1 instance
2008 _typeSystem
000007fef5973528 40017f0
30 ...ent.SqlStatistics 0
instance 0000000000000000 _statistics
000007fef87358f8 40017f1
38 System.Object[] 0 instance 00000000034eef10 _data
000007fef5e96628 40017f2
40 ...t.SqlStreamingXml 0
instance 0000000000000000 _streamingXml
000007fef5975af0 40017f3
48 ...t._SqlMetaDataSet 0
instance 00000000034eedb8 _metaData
000007fef5e94f10 40017f4
50 ...DataSetCollection 0
instance 0000000000000000 _altMetaDataSetCollection
000007fef5975c88 40017f5
58 ...e.FieldNameLookup 0
instance 0000000000000000 _fieldNameLookup
000007fef59760b8 40017f6
ac System.Int32 1 instance
0 _commandBehavior
000007fef874ed78 40017f8
b0 System.Int32 1 instance
29 ObjectID
000007fef5e59968 40017f9
60 ...tiPartTableName[] 0
instance 0000000000000000 _tableNames
000007fef8747b08 40017fa
68 System.String 0 instance 0000000000000000 _resetOptionsString
000007fef874ed78 40017fb
b4 System.Int32 1 instance 0 _nextColumnDataToRead
000007fef874ed78 40017fc
b8 System.Int32 1 instance
0 _nextColumnHeaderToRead
000007fef874b028 40017fd
80 System.Int64 1 instance 0 _columnDataBytesRead
000007fef874b028 40017fe
88 System.Int64 1 instance 0 _columnDataBytesRemaining
000007fef874b028 40017ff
90 System.Int64 1 instance 0 _columnDataCharsRead
000007fef8749400 4001800
70 System.Char[] 0 instance 0000000000000000 _columnDataChars
000007fef8747dd0 4001801
78 System.Exception 0 instance 0000000000000000 _rowException
000007fef874ed78 40017f7
9c0 System.Int32 1 shared static _objectTypeCount
>> Domain:Value
00000000002a8400:NotInit <<
Ignore all this for now, as we're not really interested in the details.We'll let the code proceed and hit the second breakpoint, after the DataReader has been passed to _using.
If we dump the object again, we can see the difference in state (and hopefully a clue as to why .Read() will not work):
If we dump the object again, we can see the difference in state (and hopefully a clue as to why .Read() will not work):
Name:
System.Data.SqlClient.SqlDataReader
MethodTable:
000007fef596df30
EEClass:
000007fef57d65c8
Size: 208(0xd0)
bytes
(C:\Windows\assembly\GAC_64\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
MT Field
Offset Type
VT Attr Value Name
000007fef87473f8 400018a
8 System.Object 0 instance 0000000000000000 __identity
000007fef59740a8 40017e0
10 ...lClient.TdsParser 0 instance
0000000000000000 _parser
000007fef59746a0 40017e1
18 ...ParserStateObject 0
instance 000000000325e180 _stateObj
000007fef596d180 40017e2
20 ...Client.SqlCommand 0
instance 00000000034ed000 _command
000007fef596cad0 40017e3
28 ...ent.SqlConnection 0
instance 00000000034e64c0 _connection
000007fef874ed78 40017e4
98 System.Int32 1 instance
1033 _defaultLCID
000007fef8746cd8 40017e5
bc System.Boolean 1 instance
0 _dataReady
000007fef8746cd8 40017e6
bd System.Boolean 1 instance 0 _haltRead
000007fef8746cd8 40017e7
be System.Boolean 1 instance
1 _metaDataConsumed
000007fef8746cd8 40017e8
bf System.Boolean 1 instance 0 _browseModeInfoConsumed
000007fef8746cd8 40017e9
c0 System.Boolean 1 instance
0 _isClosed
000007fef8746cd8 40017ea
c1 System.Boolean 1 instance
1 _isInitialized
000007fef8746cd8 40017eb
c2 System.Boolean 1 instance 1 _hasRows
000007fef5e63490 40017ec
9c System.Int32 1 instance
0 _altRowStatus
000007fef874ed78 40017ed
a0 System.Int32 1 instance
-1 _recordsAffected
000007fef874ed78 40017ee
a4 System.Int32 1 instance
30 _timeoutSeconds
000007fef5975698 40017ef
a8 System.Int32 1 instance
2008 _typeSystem
000007fef5973528 40017f0
30 ...ent.SqlStatistics 0 instance 0000000000000000 _statistics
000007fef87358f8 40017f1
38 System.Object[] 0 instance 00000000034eef10 _data
000007fef5e96628 40017f2
40 ...t.SqlStreamingXml 0
instance 0000000000000000 _streamingXml
000007fef5975af0 40017f3
48 ...t._SqlMetaDataSet 0
instance 00000000034eedb8 _metaData
000007fef5e94f10 40017f4
50 ...DataSetCollection 0
instance 0000000000000000 _altMetaDataSetCollection
000007fef5975c88 40017f5
58 ...e.FieldNameLookup 0
instance 0000000000000000 _fieldNameLookup
000007fef59760b8 40017f6
ac System.Int32 1 instance
0 _commandBehavior
000007fef874ed78 40017f8
b0 System.Int32 1 instance
29 ObjectID
000007fef5e59968 40017f9
60 ...tiPartTableName[] 0
instance 0000000000000000 _tableNames
000007fef8747b08 40017fa
68 System.String 0 instance 0000000000000000 _resetOptionsString
000007fef874ed78 40017fb
b4 System.Int32 1 instance 0 _nextColumnDataToRead
000007fef874ed78 40017fc
b8 System.Int32 1 instance
0 _nextColumnHeaderToRead
000007fef874b028 40017fd
80 System.Int64 1 instance 0 _columnDataBytesRead
000007fef874b028 40017fe
88 System.Int64 1 instance -1 _columnDataBytesRemaining
000007fef874b028 40017ff
90 System.Int64 1 instance 0 _columnDataCharsRead
000007fef8749400 4001800
70 System.Char[] 0 instance 0000000000000000 _columnDataChars
000007fef8747dd0 4001801
78 System.Exception 0 instance 0000000000000000 _rowException
000007fef874ed78 40017f7
9c0 System.Int32 1 shared static _objectTypeCount
>> Domain:Value 00000000002a8400:NotInit <<
Scanning from top to bottom, the thing that's immediately striking is that the "_parser" field is now null. The DataReader object has definitely changed as a result of calling our function, even though it hasn't even done anything with the object at this point — or so it might appear.
Now that we have a foot in the door, we can watch for changes to the _parser field to catch the change in the act. To that end, we run the code again, have it hit the breakpoint again and dump the stack again:
OS Thread Id:
0xe0c (7)
RSP/REG Object Name
r14 000000000302c7a0
System.Management.Automation.MethodInformation
000000001d66c7d8
00000000032dfad0 System.Data.SqlClient.SqlDataReader
...
Let's set a processor breakpoint on the field:
ba r 8
00000000032dfad0+10
This sets a breakpoint, the size of a reference (8 bytes, since we're running on 64-bit) on the location of the _parser field (which has field offset 10, as evidenced in the above dump). As soon as anything accesses this field, we'll know.
Continuing will hit the breakpoint, at a very telling location (!CLRStack):
OS Thread Id:
0xe0c (7)
Child-SP RetAddr Call Site
000000001d66c870
000007fef592977b System.Data.SqlClient.SqlDataReader.ReadInternal(Boolean)
000000001d66c930
000007fee83b1f93 System.Data.Common.DbEnumerator.MoveNext()
000000001d66c9a0
000007fee83afc4c
System.Management.Automation.ParserOps.MoveNext(System.Management.Automation.ExecutionContext,
System.Management.Automation.Token, System.Collections.IEnumerator)
000000001d66ca30
000007fee83af323 System.Management.Automation.ParameterBinderBase.DecodeValue(System.Object,
System.Type, System.Management.Automation.ParameterCollectionTypeInformation,
Boolean)
000000001d66caf0
000007fee8832b92
System.Management.Automation.ParameterBinderBase.BindParameter(System.Management.Automation.CommandParameterInternal,
System.Management.Automation.CompiledCommandParameter,
System.Management.Automation.ParameterBindingFlags)
000000001d66ccf0
000007fee83ae99e System.Management.Automation.ParameterBinderController.BindParameter(UInt32,
System.Management.Automation.CommandParameterInternal,
System.Management.Automation.MergedCompiledCommandParameter,
System.Management.Automation.ParameterBindingFlags)
...
Using Reflector (or, since that's no longer free, ILSpy) to pull apart System.Management.Automation.ParameterBinderBase.DecodeValue gives us the full story.
What's happening here is that PowerShell sees that the DataReader is an enumerable object, so it happily goes ahead and enumerates it to see if it can decompose it into a collection of tokens. PowerShell sees that the DataReader does not produce tokens, so it discards the enumeration and passes the DataReader on as a single object, but the damage is already done. Enumerating a DataReader is something you can only do once, and it consumes the data.
Tokens are something internal to PowerShell, so this is arguably a bug — it's not the user's fault if the PowerShell team couldn't figure out a way of passing around token enumerations without invoking side effects on things that have nothing to do with tokens. This is what I meant when I said flexibility comes at a price: with all the ways PowerShell has for handling enumerations, passing along parameters and doing conversions, something's bound to break, or at least act in a non-intuitive fashion.
The problem is quite fundamental, and not easily resolved without resorting to trickery. By far the simplest way around this is to abandon the idea of _using in the first place. Like I said at the beginning, scripts are not long-lived objects. They shouldn't need disposal mechanisms anyway. If you do need to dispose the occasional expensive object in the middle of a script, you can just insert an explicit call. Quick and practical — your boss will love you.
The second-simplest way is to abandon DataReader. Using DataAdapter and DataTable avoids these problems because in-memory collections can be enumerated as many times as you like with no adverse effects. These classes do consume more memory and processing cannot begin until all data has been read, which could be a problem with big data processing tasks.
Suppose you want to stick with DataReader, or you have another scenario with an IEnumerable-implementing object that you need to pass without invoking the enumeration. There is a workaround: pass the object as a single-element array. This trick is quite common in PowerShell scripting since it's particularly easy to do (put a single comma in front of the expression to turn it into an array) and there are plenty of scenarios for which PowerShell's unwrapping behavior is overly eager.
Obviously, this means you need to change your function to handle single-element arrays. It also places a burden on the caller, who must know to do this for IEnumerable-implementers, but we can check for this in the function itself so it doesn't get overlooked:
What's happening here is that PowerShell sees that the DataReader is an enumerable object, so it happily goes ahead and enumerates it to see if it can decompose it into a collection of tokens. PowerShell sees that the DataReader does not produce tokens, so it discards the enumeration and passes the DataReader on as a single object, but the damage is already done. Enumerating a DataReader is something you can only do once, and it consumes the data.
Tokens are something internal to PowerShell, so this is arguably a bug — it's not the user's fault if the PowerShell team couldn't figure out a way of passing around token enumerations without invoking side effects on things that have nothing to do with tokens. This is what I meant when I said flexibility comes at a price: with all the ways PowerShell has for handling enumerations, passing along parameters and doing conversions, something's bound to break, or at least act in a non-intuitive fashion.
The problem is quite fundamental, and not easily resolved without resorting to trickery. By far the simplest way around this is to abandon the idea of _using in the first place. Like I said at the beginning, scripts are not long-lived objects. They shouldn't need disposal mechanisms anyway. If you do need to dispose the occasional expensive object in the middle of a script, you can just insert an explicit call. Quick and practical — your boss will love you.
The second-simplest way is to abandon DataReader. Using DataAdapter and DataTable avoids these problems because in-memory collections can be enumerated as many times as you like with no adverse effects. These classes do consume more memory and processing cannot begin until all data has been read, which could be a problem with big data processing tasks.
Suppose you want to stick with DataReader, or you have another scenario with an IEnumerable-implementing object that you need to pass without invoking the enumeration. There is a workaround: pass the object as a single-element array. This trick is quite common in PowerShell scripting since it's particularly easy to do (put a single comma in front of the expression to turn it into an array) and there are plenty of scenarios for which PowerShell's unwrapping behavior is overly eager.
Obviously, this means you need to change your function to handle single-element arrays. It also places a burden on the caller, who must know to do this for IEnumerable-implementers, but we can check for this in the function itself so it doesn't get overlooked:
_using (,$datareader) {
$datareader.Read()
}
...
if ($o -is [collections.ienumerable] -and -not ($o -is [array])) {
throw "Wrap IEnumerables in an array
before passing them."
}
try {
&$t
} finally {
$o|%{if ($_ -is [idisposable]) { $_.dispose() }}
}
}
Obviously this is crude: there are IEnumerable-implementers that are not arrays that could be handled correctly with no further tricks, but we take no risks and demand that everything that's not an array be protected.
If you try it out (by removing the comma), you'll notice that the error message isn't really helpful because the location of the error is the function, rather than the caller. There are ways around this too, but maybe that's a topic for another post. As an aside, PowerShell has structured ways of parameter validation (see about_Functions_Advanced_Parameters) but these will do us no good here since they involve binding and are thus subject to the same problem with enumeration.
For now, I just wanted to demonstrate a common PowerShell-pitfall manifesting in a not-so-common way and an even less common way of determining what the problem is. Happy scripting. :-)
3 comments:
Really nice post. Definately saved me some time. :)
This is very nice useful post. Thanks to detailed info.
Awesome analysis.
Post a Comment