Saturday, April 05, 2014

Haskell error reporting with locations, update

Since some people (I'm among them) dislike impure features in Haskell I thought I'd present a slight variation on the error location feature that is "pure".

First, the __LOCATION__ variable gets an abstract type. So

  data Location
  __LOCATION__ :: Location
It's defined in the Prelude and always in scope. The type cannot be compared, shown, or anything. There's just one thing that can be done, namely:
  extractLocation :: Location -> IO String
The error function needs a new exception to throw
  data ErrorCallLoc = ErrorCallLoc Location String

  {-# LOCATIONTRANSPARENT error #-}
  error :: String -> a
  error s = throw (ErrorCallLoc __LOCATION__ s)
This means that the location string cannot be used when we throw the error. But it can be used where the error is caught, since this can only be done in the IO monad.

Under the hood the everything is just as before, Location is just a string. It just can't be manipulated except in the IO monad, so we can pretend it's pure.

  newtype Location = Location String
  extractLocation (Location s) = return s
It now looks a lot like Michael Snoyman's proposal.

Friday, April 04, 2014

Haskell error reporting with locations

Error reporting in GHC is not always the nicest. For example, I often develop code by using undefined as a placeholder for code I've not written yet. Here's a simple example:
import System.Environment
  main = do
    args <- getargs
    if null args then
      undefined
     else
      undefined
Running this looks like this:
$ ./H
  H: Prelude.undefined
Which undefined caused that? Looking at the error message we have no idea. Wouldn't it be nice with some location information?

We can actually get location information by using Control.Exception.assert:

import Control.Exception(assert)
  import System.Environment

  main = do
    args <- getargs
    if null args then
      assert False undefined
     else
      assert False undefined
Now running it is much more informative:
$ ./H
  H: H.hs:7:9-14: Assertion failed
How is assert able to report the location? If we dig deep enough we discover that it's because the ghc compiler contains a special hack to recognize this function and give it location information.

A generalized hack

In a Haskell compiler that I've implemented I've taken this compiler hack and extended it so it can be used for any function.  It comes in two parts, location information and location transparent definitions.

__LOCATION__

The __LOCATION__ identifier is always defined and utterly magical. Its value is a string that describes the location of that very identifier. This is the very opposite of a referentially transparent name. In fact it's value varies with where it is placed in the code! So it's definitely not for purists. But I'm a practical man, so I sometimes have resort of the ugliness of reality. And in reality we want to report locations in errors.

Enough philosophy, here's an example:

main = do
    print __LOCATION__
    print   __LOCATION__
And running it prints:
"test/Test.hs:2:11"
  "test/Test.hs:3:13"
And to illustrate the impurity:
main = do
    let loc = __LOCATION__
    print loc
    print loc
And running this:
"test/Test.mu:2:15"
  "test/Test.mu:2:15"

Location transparency

The __LOCATION__ identifier gives the location of itself. This is of little use on its own. Imagine the definition we could give for undefined. Somewhere in the Prelude module it could say something like
undefined = error ("undefined: " ++ __LOCATION__)
But if we use this all that it will tell us is where the definition of undefined is, not where it was used.

To get the point of use instead of the definition I've introduced location transparent definitions. In a location transparent definition the __LOCATION__ identifier will not refer to its own position, but to the position of the reference to the definition. Location transparency is introduced with a pragma.

{-# LOCATIONTRANSPARENT undefined #-}
  undefined = error ("undefined: " ++ __LOCATION__)
With this definition our initial example looks like this when we run it:
undefined: test/H.hs:6:9
In fact, the real definition of undefined doesn't look like that. The __LOCATION__ identifier is only used in the definition of error, so it looks something like this:
{-# LOCATIONTRANSPARENT error #-}
  error :: String -> a
  error s = throw (ErrorCall (__LOCATION__ ++ ": " ++ s))

  {-# LOCATIONTRANSPARENT undefined #-}
  undefined = error "undefined"
Since both error and undefined are transparent any use of undefined will be reported with the location of the use.

Furthermore, we can make a few more functions location transparent, e.g.,

{-# LOCATIONTRANSPARENT head #-}
  head :: [a] -> a
  head [] = error "Empty list"
  head (x:xs) = x
A simple example:
main = putStr (head [])
Which will print:
test/Head.hs:1:16: Empty list
which is the location where head was called.

Implementation

There are different ways to implement this feature, and I'm going to sketch two of them.

First: Every function that has the LOCATIONTRANSPARENT pragma will be inlined at the point of use, and the __LOCATION__ identifier in the inlined code will be updated to reflect the call site. The definitions must be processed in a bottom-up fashion for this to work. It's fairly simple to implement, but will cause some code bloat due to inlining.

Second: Every function that has LOCATIONTRANSPARENT pragma will be rewritten (by the compiler) to have an extra location argument, and each use of this function will be rewritten to pass in the current location. For example (using $$f for the location version of f):

main = putStr ($$head __LOCATION__ [])

  $$head __LOCATION__ [] = $$error __LOCATION__ "Empty list"
  $$head __LOCATION__ (x:xs) = x
  $$error __LOCATION__ s = throw (ErrorCall (__LOCATION__ ++ ": " ++ s))
This should be fairly straightforward to implement, but I've not tried it. (It's somewhat like dynamic binding, so maybe ghc could reuse that mechanism for locations.)

And, of course, the global __LOCATION__ identifier has to be recognized by the compiler and replaced by a string that is its location.

Conclusion

I implemented the __LOCATION__ hack quite a while ago, and I like the much improved reporting of error locations. I hope someone will add it to ghc as well.

Labels:

Thursday, April 03, 2014

A small Haskell extension

The extension

In Haskell you can give a type to an expression by writing expr ::  type.  To an untrained eye the :: looks just like an infix operator, even if it is described as a special syntactical construct in the Haskell report.  But let's think about it as an infix operator for a moment.

For an infix operator you you can for a section, i.e., a use of the operator with one operand left out.  For instance (* 2) leaves out the first operand, and Haskell defines this to be the same as (\ x -> x * 2).  Regarding :: as an operator we should be able to write (:: type) and it should have the obvious meaning (\ x -> x :: type).

I suggest, and I plan sending the haskell-prime mailing list, Haskell should adopt this small extension.

Why?

First, the extension is very light weight and has almost no extra intellectual weight for anyone learning Haskell.  I'd argue it makes the language simpler because it allows :: to be treated more like an infix operator.  But without use cases this would probably not be enough of an argument.

Example 1

We want to make a function, canonDouble, that takes a string representing a Double and changes it to the standard Haskell string representing this Double.  E.g. canonDouble "0.1e1" == "1.0".  A first attempt might look like this:

  canonDouble :: String -> String
  canonDouble = show . read         -- WRONG!

This is, of course, wrong since the compiler cannot guess that the type between read and show should be a Double.  We can convey this type information in different ways, e.g.:

  canonDouble :: String -> String
  canonDouble = show . asDouble . read
    where asDouble :: Double -> Double
          asDouble x = x

This is somewhat clumsy.  Using my proposed extension we can instead write:

  canonDouble :: String -> String
  canonDouble = show . (:: Double) . read

This has the obvious meaning, and succinctly describes what we want.

Example 2

In ghc 7.8 there is a new, better implementation of Data.Typeable.  It used to be (before ghc 7.8) that to get a TypeRep for some type you would have to have a value of that type.  E.g., typeOf True gives the TypeRep for the Bool type.  If we don't have a value handy of the type, then we will have to make one, e.g., by using undefined.  So we could write typeOf (undefined :: Bool).

This way of using undefined is rather ugly, and relies on non-strictness to work.  Ghc 7.8 add a new, cleaner way of doing it.

  typeRep :: proxy a -> TypeRep

The typeRep function does not need an actual value, but just a proxy for the value.  A common proxy is the Proxy type from Data.Proxy:

  data Proxy a = Proxy

Using this type we can now get the TypeRep of a Bool by writing typeRep (Proxy :: Proxy Bool).  Note that in the type signature of typeRep the proxy is a type variable.  This means we can use other values as proxies, e.g., typeRep ([] :: [Bool]).

We can in fact use anything as a proxy that has a structure that unifies with proxy a.  For instance, if we want a proxy for the type T we could use T -> T, which is the same as (->) T T.  The (->) T part makes of it is the proxy and the last T makes up the a.

The extension I propose provides an easy way to write a function of type T -> T, just write (:: T).  So to get a TypeRep for Bool we can simply write typeRep (:: Bool).  Doesn't that look (deceptively) simple?

In fact, my driving force for coming up with this language extension was to get an easy and natural way to write type proxies, and I think using (:: T) for a type proxy is a as easy and natural as it gets (even if the reason it works is rather peculiar).

Implementation

I've implemented the extension in one Haskell compiler and it was very easy to add and it works as expected.  Since it was so easy, I'll implement it for ghc as well, and the ghc maintainers can decide if the want to merge it.  I suggest this new feature is available using the language extension name SignatureSections.

Extensions

Does it make sense to do a left section of ::?  I.e., does (expr ::) make sense?  In current Haskell that does not make sense, since it would be an expression that lacks an argument that is a type.  Haskell doesn't currently allow explicit type arguments, but if it ever will this could be considered.

With the definition that (:: T) is the same as (\ x -> x :: T) any use of quantified or qualified types as T will give a type error.  E.g., (:: [a]), which is (\ x -> x :: [a],  is a type error.  You could imagine a different desugaring of (:: T), namely (id :: T -> T).  Now (:: [a]) desugars to (id :: [a] -> [a]) which is type correct.  In general, we have to keep quantifiers and qualifiers at the top, i.e., (:: forall a . a) turns into (id :: forall a . a -> a).

Personally, I'm not convinced this more complex desugaring is worth the extra effort.

Labels: