dburrows/ blog/ entry/ from-blogspot/ A type-system hack

WARNING: nothing in this post has been implemented or tested, except maybe at a minimal proof-of-concept level ("it compiles!"). It's just some musings on how you can make the computer do your work for you.

I was reading a defense of Hungarian notation when I started to get a familiar notion tickling the back of my brain. The author was explaining that the original idea behind Hungarian was not to replicate type information in variable names, but rather to include some sort of usage information: one example would be giving a string value holding a value that is untrusted the name utName. While it's more reasonable than lpschwzMyVariable, this suggesion made me suspicious that the author was really hacking around suckiness in the C language.

Tagging values with usWhatever and then verifying that you never mix usBlah with Blah isn't just tedious and error-prone, it smells a lot like doing work by hand that the computer ought to do for you. It seemed to me that it ought to be possible to "tag" unsafe values at the type level and let the language's type-checker sort things out. The simple approach is of course to define something like this:

newtype Unsafe t = Unsafe t

NB: I'm using Haskell because it was the first language that sprang to mind when I thought "type system hack".

This newtype declaration lets you attach an Unsafe tag to any type. Presumably the type signatures of routines that return untrusted data (such as a hypothetical getHTMLFormValue or a routine to retrieve previously entered form values from a database, if they aren't sanitized first) should be modified; for instance, if the signature of getHTMLFormValue was

getHTMLFormValue :: HTMLForm -> String -> String

it should instead read

getHTMLFormValue :: HTMLForm -> String -> Unsafe String

To create a new Unsafe value you simply apply the Unsafe constructor to the value:

getHTMLFormValue = Unsafe (oldGetHTMLFormValue)

To unwrap Unsafes you use pattern-matching:

let (Unsafe unwrapped) = wrapped in ...

So, this sort of works, but it's unsatisfying on several levels:

While it may not be perfect -- it only makes sure that you don't accidentally violate safety, and doesn't stop you from deliberately doing something dumb -- this approach feels a lot safer to me than ad-hoc mangling of variable names. I used Haskell because you can do type hacks fairly easily in it, but I don't see why you couldn't play a similar game (albeit much less conveniently) in C++, or even in a dynamic language like Python. Of course, Python would give you runtime errors about safety violations instead of static ones, but that still beats getting runtime security holes.

I would add as a hypothesis that it should be possible to express just about any variable-name-mangling technique whose goal is to "make wrong code look wrong" in the type system, provided your type system doesn't suck; and that doing so is preferable to doing it manually, since it means that a lot of repetitive error-checking can be handled by the computer. And since the computer is better than you are at verifying formal invariants, forcing it to handle them whenever possible is always a good idea.

Comment by Adam G at 12:33 PM:

I'm still pretty new to Haskell and to fancy type theory in general, but it seems to me that Unsafe is a perfect candidate for monad-hood...

Comment by dburrows at 11:26 PM:

That's quite possibly true. I must admit that because of my language history, I tend to view Haskell as a sort of fancy ML, and so I haven't really explored the full power of monads yet. (I do have a book on category theory on my Amazon wishlist...)