23 June 2023
The importance of information maximization. This is something I don’t often see discussed in the context of software design or development, but I find it to be of great importance. Important enough for me to consider it a principle; we might call it the “information maximization principle”.
Consider a software application as an information processing machine composed of a series of interconnected functions combined to achieve some specific outcome. Each of the functions making up the machine takes some input and returns some output. The output returned is usually a combination of the input and some knowledge / information contained within the function about what to do with the input. So within each function of the information processing machine, some new information is extracted, constructed, or computed.
However, many times code is written such that valuable information is dropped, returning instead (or performing some side effect) only a fraction of the total information; just enough to satisfy an immediate, localized need. This has some negative repercussions. e.g.,
In a previous article I gave an example of how to leverage statically typed languages to encode our domain in the type system, thus allowing the compiler to verify some of our domain. The exercise was based on the following very simple specification for a task management application:
Let’s now work through this same example from the angle of information maximization. I’ll try to show why dropping information is a problem, and why the alternative of preserving, type-encoding, and passing along the new information is much more advantageous.
Let’s start like we did before, implementing the spec with the following types:
type Role =
| Administrator
| Editor
| Viewer
type UserId = UserId of Guid
type User = {Id: UserId; Name: string; Role: Role}
type TaskId = TaskId of Guid
type Task =
{ Id: TaskId
string
Title:
State: State} Assignee: Option<UserId>
…and the following functions: createTask
, to create a
new task, and changeState
, to change the state of an
existing task. As we did before, we start with the following
implementations:
let createTask (user: User) (title: string): Option<Task> =
match user.Role with
| Administrator ->
Some{ Id = TaskId(Guid.NewGuid())
Title = title
State = ToDo}
Assignee = None
| Editor -> None
| Viewer -> None
let changeState (user: User) (state: State) (task: Task) : Option<Task> =
match user.Role with
| Administrator{ task with State = state }
| Editor -> Some | Viewer -> None
I note again that these implementations are not good because they have a double duty:
With this implementation any authorization verification required
later on in any other place (e.g., an assignTask
function
added later) would require a duplication of the authorization
verification code. As we did before, let’s pull that code out into an
Authorization module. This time, however, we’ll do it more naively.
Instead of returning an Auth
type, as we did before, we will simply validate that
the user has authorization, and return a Boolean:
let canCreateTask (user: User) : bool =
match user.Role with
true
| Administrator ->
| Editorfalse
| Viewer ->
let canEditTask (user: User) : bool =
match user.Role with
| Administratortrue
| Editor -> false | Viewer ->
But now, what should our Task functions look like? We don’t need the
User
parameter to verify the Role, so just remove it?
let createTask title : Task =
{ Id = TaskId(Guid.NewGuid())
Title = title
State = ToDo}
Assignee = None
let changeState (state: State) (task: Task) : Task = { task with State = state }
Now, how do we protect these functions from being called in a context where they should not, in a context where the spec would not allow it? To verify authorization, we need to invoke the Authorization functions before calling these Task functions, e.g.,:
if canCreateTask user then
"A title" |> Some
createTask else
None
But this is so fragile! Who’s stopping us from just calling
createTask
directly, bypassing the call to
canCreateTask
, knowingly or otherwise?
There are two big problems with this implementation:
createTask
stating that it should be called only in the
context of a particular authorization.1This second point is what I want to highlight here. That
canCreateTask
function is trading a rich type that has a
fair amount of valuable information (the user’s role, among other
things) for a single bit. Definitely not the best trade. Instead, the
function should return a type that captures all the information it has
computed, just as we did in the previous
article: .
type Auth<'p> =
private { permission: 'p }
member this.Permission = this.permission
type CreateTask = CreateTask of UserId
type EditTask = EditTask of UserId
let getCreateTaskAuth (user: User) : Option<Auth<CreateTask>> =
match user.Role with
{ permission = CreateTask user.Id }
| Administrator -> Some
| Editor | Viewer -> None
We can even take this a bit further and have the
getCreateTaskAuth
function return the entire
User
passed in as an argument, rather than just its Id:
type CreateTask<'a> = CreateTask of 'a
type EditTask<'a> = EditTask of 'a
let getCreateTaskAuth (user: User) : Option<Auth<CreateTask<User>>> =
match user.Role with
{ permission = CreateTask user }
| Administrator -> Some
| Editor | Viewer -> None
Now, as we did before, we can have the createTask
function demand a type that must be
computed before createTask
can be called:
let createTask (_: Auth<CreateTask<User>>) (title: string) : Task =
{ Id = TaskId(Guid.NewGuid())
Title = title
State = ToDo} Assignee = None
So there you have it; a small example that, hopefully, demonstrates well enough what I mean by maximizing information. Alexis King calls this “parse, don’t validate”, and she makes a great case for this very point: don’t discard information. The same essential idea is broadly captured by Eric Steven Raymond2’s partial lists of UNIX philosophy prescriptions:
“When filtering, never throw away information you don’t need to”.
I’ll end with some recommended practices that follow from this idea of information maximization:
This was the main point in the previous article, and was solved by encoding the authorization requirement in the function’s input parameter type. .↩︎
“The Art of Unix Programming” (Boston: Addison-Wesley, 2004).↩︎