Friday, August 24, 2012

Generics and Implicit Conversion Operators Create an Identity Crisis


I keep running into problems with my Unit Tests for a project I'm working on at work, in particular, when I execute Assert.AreEqual(expected, actual).

Some Background Information

Currently, I'm working on some software that will integrate an accounting system with some online banking web services. This is so that instead of manually entering transactions in both the accounting system and then again up at the bank's website, firms can enter them once into the accounting system and the integration software I'm writing will handle making the appropriate web service calls to initiate the actual transactions at the bank.

There is a consortium out there known as the Interactive Financial eXchange Forum, usually known simply as IFX. They're an industry group that writes specifications for the transfer of financial data using XML and web services. Many financial instutions and software packages use specifications from the IFX, such as Intuit, the makers of Quicken software.

Some of these specifications are available for download for free. The most recent versions (and the most useful, e.g. XSD schemas) are often available to members only (and that pricetag is quite steep). I'm using version 1.7. Within the version 1.7 schemas, there is a type known as NC, which stands for "narrow-character" string. A narrow-character string is basically all 7-bit ASCII characters below 0x7F.

My first thought was to write an extension method to perform the conversion:

public static string UTF8ToLatin1(this string value);

This worked well until I discovered that I had to constantly remember that if a string variable needed to be a narrow-character string, I needed to call that extension method anytime I changed its value. Needless to say, I found occassions where I forgot to call the method. So while cool, this was not a good long-term solution.

A New Class is Born

So then I decided to get "cute"; but hopefully not too cute. It turns out that string is a sealed class. This is unfortunate, but there's probably a good reason for that. So I decided to write my own wrapper class that wraps a string value that provides all the necessary validation while delegating much of its behavior to string and all the while making it inheritable (there are several specializations of a narrow-character string used throughout the IFX specification schemas). I called it NCString.

Everything was going pretty well until I realized that I couldn't just simply assign a string to a NCString object and vice versa without using some sort of property or method accessor. Well I don't know about you, but that just smelled to me.

I then went researching to see if there was any way to create some sort of overloaded casting operator (sort of like C++'s static_cast and dynamic_cast operators, and what-not). It turns out, .NET has what they call conversion operators. I had forgotten all about these, mainly because they look a lot different since CLR 2.0 than from CLR 1.0, and honestly, I never had the need to implement one before (except for the occasional ToXXXX/FromXXXX methods).

Conversion Operators

In CLR 1.0, conversion operators were merely method calls in the form of ToXXX(...)/FromXXX(...) for languages that don't support operator overloading, and named operators called op_Explicit and op_Implicit for those languages that do support operator overloading.

In CLR 2.0, conversion operators got some syntactic sugar (at least, in C#). Explicit conversion operators, when used, look like regular casts from one type to another (much like how you can create a constructor in C++ that takes the type to convert from, which then allows you to use the C++ cast convention instead of calling the constructor directly). Implicit operators are even cooler in that the syntactic sugar allows you to directly assign the object to convert from directly to the object being converted to, as if both objects were of the same type. Obviously, there's much room for abuse here, so there are some compiler-enforced rules as well as general guidelines on their (as well as explicit conversion operator's) use, which can be found here and here.

So in my NCString class, I created the required constructors, implemented the requisite object.Equals methods, overloaded the equality operators, and finally created implicit conversion operators, both to and from string. Now, if you took the time to read the guidelines I linked to above, you will notice that I may have violated one of the guidelines:

Do not provide an implicit conversion operator if the conversion is potentially lossy.

This is sort of a judgement call. On the one hand, you could consider the conversion from string to NCString as "lossy", because potentially, a string can hold a wider range of characters than what is allowed in a NCString. However, if you try to assign a string with characters that fall outside the allowable range for a NCString, a System.ArgumentException is thrown—from the constructor of NCString, which curiously, is shown as an example in the Framework Design Guidelines documentation, and yet, which also seems to violate the following guideline:

Do not throw exceptions from implicit casts.

(If anyone has anything to say about the examples shown in the Framework Design Guidelines versus the guidelines themselves, I'd be interested to hear; leave a comment.)

As I was saying, on the one hand, the implicit conversion from string to NCString could be considered lossy. On the other hand, the two are strings, it's just that one has a more restrictive character set (e.g. it's not like I'm trying to convert from a double to an int, the conversion of which could result in actual data loss). So, in light of that, it would make it much simpler to work with the NCString class if I could convert between instances of NCString and string without needing to explicitly write a cast all the time.

Unit Testing Problems Begin

About this time is when some unit testing problems began to rear their ugly head when it came to calls to Assert.AreEqual(expected, actual). I initially began writing this blog post thinking that the generic version of this method was not properly inferring the parameter types or that the non-generic version was being called instead of the generic version. Well, it turns out to be the latter rather than the former.

When I began writing unit tests, I would have code similar to the following:

// Assert
NCString nc = "Hello World";

// Act
target = new SomeObject(nc);

// Assert
Assert.AreEqual(nc, target.NarrowString);

/* Results of assertion:
 *
 * Assert.AreEqual failed: Expected<MyNamespace.NCString>  Actual<System.String>
 *
 */

Identity Crisis

I expected that, 1) with the method having a generic method signature overload, 2) .NET's ability to infer generic method type parameters, and 3) since the parameters I passed were two different types with conversions from one to another, that actual would attempt to be casted into the type of expected, after which, they would then be tested for equality.

As I just mentioned above, in .NET, generic methods can most often infer their generic parameters, so you don't need to explicitly tell .NET what the type of the generic parameter is. This is precisely what's causing me issues. I'm used to letting .NET infer my generic parameter types (especially when it comes to LINQ statements). But in this case, it's not being inferred. Why not?

The answer is rather simple; but throw in some implicit conversion operators and the answer becomes more complex. So here's the not-so-simple answer. What was happening was that the compiler did (probably) attempt to use the generic method. But, because I didn't specify the type of the generic parameter T, instead, letting .NET (try to) infer the type, .NET was unable to infer the type. Why? Because of the implicit conversion operators. What type should .NET infer for the parameters? string? NCString? Both parameters are implicitly convertible to one another. Furthermore, even if .NET did choose one type over another, there really isn't a way to dynamically cast one type to the other through reflection. Indeed, in the end, the compiler had to choose to use the overload that takes two object parameters. And, of course, object has an Equals method that does little more than test for type equality and referential equality.

Finding Myself

There are two solutions to this problem. One easy, the other somewhat easy.

First, the somewhat easy one (my first approach, mostly because I didn't look to see what else was available in the Assert class). Explicitly tell the compiler what the generic type parameter should be so that the generic method signature is used: Assert.AreEqual<NCString>(expected, actual);. Because the two types are implicitly convertible to/from one another, and I've explicitly specified the generic type parameter to be a NCString, the actual parameter will be converted to a NCString and NCString's implementation of Equals will be used.

The really easy solution (for my case) is to use the overload that takes two string parameters and a bool (indicating whether or not the assertion should be handled case-insensitively). Using this overload, I'd have to use the third boolean parameter (in my case, always set to false) and no explicit type casting/conversion would need to be performed. (Note however, that an implicit type conversion would still be performed from NCString to string.)

Which is better? Hard to say and it mostly depends on what you're trying to test for equality. In my case, both types were strings (one more specialized than another), so there's not much of a diffirence one way or another and not much more in the way of typing for one way or the other.

Some Final Questions and Concluding Remarks

So, I wonder, does the .NET Unit Test Framework always use the Assert.AreEquals(expected, actual) overload that takes two object parameters whenever the two parameters are of different types, or just in the case when the two parameters are not the primitives for which an overload exists and a generic type parameter was not specified?

Anyway, I just wanted to post this, because this has been bothering me for a while. I didn't expect this behavior, but now that I've worked through it, I understand it. I hope that I can save you many hours of troubleshooting your unit tests.

No comments:

Post a Comment