Information Maximization

The importance of information maximization. This is something I don’t often see discussed in the context of software design or development, but I find it to be of great importance. Important enough for me to consider it a principle; we might call it the “information maximization principle”.

Consider a software application as an information processing machine composed of a series of interconnected functions combined to achieve some specific outcome. Each of the functions making up the machine takes some input and returns some output. The output returned is usually a combination of the input and some knowledge / information contained within the function about what to do with the input. So within each function of the information processing machine, some new information is extracted, constructed, or computed.

However, many times code is written such that valuable information is dropped, returning instead (or performing some side effect) only a fraction of the total information; just enough to satisfy an immediate, localized need. This has some negative repercussions. e.g.,

In a previous article I gave an example of how to leverage statically typed languages to encode our domain in the type system, thus allowing the compiler to verify some of our domain. The exercise was based on the following very simple specification for a task management application:

Let’s now work through this same example from the angle of information maximization. I’ll try to show why dropping information is a problem, and why the alternative of preserving, type-encoding, and passing along the new information is much more advantageous.

Let’s start like we did before, implementing the spec with the following types:

type Role =
    | Administrator
    | Editor
    | Viewer

type UserId = UserId of Guid

type User = {Id: UserId; Name: string; Role: Role}


type TaskId = TaskId of Guid

type Task =
    { Id: TaskId
      Title: string
      State: State
      Assignee: Option<UserId> }

…and the following functions: createTask, to create a new task, and changeState, to change the state of an existing task. As we did before, we start with the following implementations:

let createTask (user: User) (title: string): Option<Task> =
    match user.Role with
    | Administrator ->
        Some
            { Id = TaskId(Guid.NewGuid())
              Title = title
              State = ToDo
              Assignee = None }
    | Editor -> None
    | Viewer -> None

let changeState (user: User) (state: State) (task: Task) : Option<Task> =
    match user.Role with
    | Administrator
    | Editor -> Some { task with State = state }
    | Viewer -> None

I note again that these implementations are not good because they have a double duty:

With this implementation any authorization verification required later on in any other place (e.g., an assignTask function added later) would require a duplication of the authorization verification code. As we did before, let’s pull that code out into an Authorization module. This time, however, we’ll do it more naively. Instead of returning an Auth type, as we did before, we will simply validate that the user has authorization, and return a Boolean:

let canCreateTask (user: User) : bool =
    match user.Role with
    | Administrator -> true
    | Editor
    | Viewer -> false

let canEditTask (user: User) : bool =
    match user.Role with
    | Administrator
    | Editor -> true
    | Viewer -> false

But now, what should our Task functions look like? We don’t need the User parameter to verify the Role, so just remove it?

let createTask title : Task =
    { Id = TaskId(Guid.NewGuid())
      Title = title
      State = ToDo
      Assignee = None }

let changeState (state: State) (task: Task) : Task = { task with State = state }

Now, how do we protect these functions from being called in a context where they should not, in a context where the spec would not allow it? To verify authorization, we need to invoke the Authorization functions before calling these Task functions, e.g.,:

if canCreateTask user then
    createTask "A title" |> Some
else
    None

But this is so fragile! Who’s stopping us from just calling createTask directly, bypassing the call to canCreateTask, knowingly or otherwise?

This second point is what I want to highlight here. That canCreateTask function is trading a rich type that has a fair amount of valuable information (the user’s role, among other things) for a single bit. Definitely not the best trade. Instead, the function should return a type that captures all the information it has computed, just as we did in the previous article: .

type Auth<'p> =
    private { permission: 'p } 

    member this.Permission = this.permission

type CreateTask = CreateTask of UserId
type EditTask = EditTask of UserId

let getCreateTaskAuth (user: User) : Option<Auth<CreateTask>> =
    match user.Role with
    | Administrator -> Some { permission = CreateTask user.Id }
    | Editor
    | Viewer -> None

We can even take this a bit further and have the getCreateTaskAuth function return the entire User passed in as an argument, rather than just its Id:

type CreateTask<'a> = CreateTask of 'a
type EditTask<'a> = EditTask of 'a

let getCreateTaskAuth (user: User) : Option<Auth<CreateTask<User>>> =
    match user.Role with
    | Administrator -> Some { permission = CreateTask user }
    | Editor
    | Viewer -> None

Now, as we did before, we can have the createTask function demand a type that must be computed before createTask can be called:

let createTask (_: Auth<CreateTask<User>>) (title: string) : Task =
    { Id = TaskId(Guid.NewGuid())
      Title = title
      State = ToDo
      Assignee = None }

So there you have it; a small example that, hopefully, demonstrates well enough what I mean by maximizing information. Alexis King calls this “parse, don’t validate”, and she makes a great case for this very point: don’t discard information. The same essential idea is broadly captured by Eric Steven Raymond²’s partial lists of UNIX philosophy prescriptions:

I’ll end with some recommended practices that follow from this idea of information maximization:

This was the main point in the previous article, and was solved by encoding the authorization requirement in the function’s input parameter type. .↩︎
“The Art of Unix Programming” (Boston: Addison-Wesley, 2004).↩︎