Hacking life, the binary way.

Why Post-Functional Programming Matters

| Comments

The limits of my language mean the limits of my world.” – Ludwig Wittgenstein

Computer (and computing) science (CS) is a very peculiar field of research when compared to other sciences, particularly concerning themes such as software engineering and system’s architecture. For starters, it has little more than 80 years, which puts it into an interesting proposition when compared to Physics or Mathematics: having a “conversation” with key personalities is not that unrealistic. Unlike Aristotle, Isaac Newton or Albert Einstein, many of the “genius” behind computer science are still living among us; a gentle evidence of its youth.

On the other hand, the liberal use of the term “science” (even engineering as in computing engineering), makes our Physics friends flinch at our day-to-day methodology. We don’t calculate programs. We don’t derive implementations. And when something goes wrong, our assessment method is so empirical that it would be equivalent to building a copy of the failed bridge, and observe bolt by bolt – with intuition (and stack dumps) guiding which bolts to observe first – hoping to find the one that fails. This arguable “lack of scientific rigour” is well reflected when Donald Knuth wrote one of the most rigorous treatises on computer science, and decided to call it “The Art of Computer Programming1.

But this constant attitude towards the synthesis of programs doesn’t mean that development cannot be made more rigorous 2, or at least more scientific or closer to the exact sciences. The hacker way of coding first, asking questions later is more due to the attractiveness of computers as a creation tool than to any inherent property (or limitation) of the field. Researchers and professionals of the so called critical systems had demonstrated this possibility years ago, with the aid of formal methods 3. Much of this research builds upon a rigorous abstraction of computability and language, and is known as Lambda Calculus. Formalized circa 1930, it has enabled the research on the behaviour of computational systems, the creation of predictive models of its properties, and to uncover high-level patterns of computation that abstract irrelevant details and empower the creation of even more complex systems. That’s what happens when you use mathematics, and that’s what makes a science… well… scientific; much more important than the capability of explaining current observations, is the ability to “predict” yet unobserved phenomena.

But we don’t need to go into “heavy” formal systems, such as Coq or Isabelle, to reap the benefits of mathematics, and still have a practical framework to build and deploy products 4. In fact, the 1950s gave us such tool: LISP 5.

Notwithstanding, most of the tools that we use today are not LISP based. But most are C based, and despite the fundamental differences Alan Kay introduced with Object-Oriented languages, they are (still) an imperative variant. And with imperative languages, any relation between the math behind the problem, and the implementation artefacts (source-code), is the sole responsibility of the developer. We know the mathematical model behind the QuickSort algorithm, and of the binary search in a list. But none of these relations “emerge” when we contemplate (or glance) at its source-code in C. It is, in its most fundamental sense, a list of commands that a system “rigorously” obeys 6, though it represents an extremely close model of what the CPU is doing, and as such it has enjoyed outstanding run-time performance and massive adoption 7.

On the other end of the spectrum there are the LISP (and ML) variants and other Lambda-Calculus inspirations. This isn’t about a mere syntactic preference, favouring parenthesis () over brackets {}. This is about a fundamentally different perspective, where programs no longer manipulate memory nor define their own flux, but are more regarded as algebraic equivalences and substitution of symbols, where memory and flux are implementation details of the machine. These languages are called functional because they focus on verbs, on functions and composition; mathematical functions as transformations and relations, in contrast to imperative languages that focus on nouns, on things, what they are, and their corresponding capabilities.

But discussing programming languages is hard… We know that people tend to defend their choices by considering language criticism a personal attack on themselves, and stay inside their comfort zone. Discussing programming paradigms is even more difficult; the typical Java developer may be opinionated about C#, but it will probably be very defensive against… say… Erlang or Haskell. Endless discussions over parenthesis or commas will spring to life. Code readability (whatever that is) will be summoned. Even the categories of average programmers and real-world programs will be promptly accepted as axiomatic facts of daily programming. Irony and sarcasm will reign, such as the recent dichotomy of programming being either “math” or “interpretative dance” 8.

So why give a 60 year old concept any importance? The above was tried and claimed years ago, and yet, imperative (and OO) languages are now the norm. Perhaps once we understand what Functional Programming enables us to do today (instead of discussing semantics over what it is), we will come to understand what Paul Graham once said 9:

… the short explanation of why this 1950s language is not obsolete is that it was not technology but math, and math doesn’t get stale. The right thing to compare Lisp to is not 1950s hardware, but, say, the Quicksort algorithm, which was discovered in 1960 and is still the fastest general-purpose sort.

So, without much ado, here’s the top 5 reasons Why (Post-)Functional Programming Matters for the ad-tech companies 10:

Reason N. 5 They scale!

“The way the processor industry is going, is to add more and more cores, but nobody knows how to program those things. I mean, two, yeah; four, not really; eight, forget it.” – Steve Jobs

Pick a software architect – any architect that has drawn large-scale software systems — and ask her about design and architectural patterns, say Pipes and Filters, Client-Server or the Observer pattern; she’ll undoubtedly tell you is that they are now accepted as an invaluable tool to create, communicate and document the orchestration among the myriad of components in a large system. The legacy we inherited from Gamma et al., heavily inspired in the works of Christopher Alexander 11, goes well beyond the mere catalogue of recurrent designs. It gives us a framework to capture and systematize empirical knowledge, to uncover new relations and to think at higher levels of abstraction.

But these “classic” patterns lack something of a greater value too: they lack the mathematical formalism that allows one to reason about their structure and their interactions 12. Let me tempt you with an example. Consider both the set of natural numbers and an operation (such as sum) over them; there are a lot of properties about the sum. First, sum is an endofunction, which is a funny way to say that if you sum two natural numbers, you get a natural number as a result. It also has the so called neutral element: if you sum something with zero, you get back the same number. Sum is also an associative operation; (1+2)+3 is equivalent to 1+(2+3). These properties – an associative endofunction with an identity – form what is known as a monoid 13.

There are endless monoids over endless types of structures. String concatenation, for example, form a monoid over strings (with the identity being the empty string). One can simply regard the fancy name monoid as a “pattern” of structures; but a different type of pattern in the sense that it is strictly defined 14.

Now your question should be: what does this matter to me, or to the construction of advertising systems? Very simple. Monoids (being associative) enjoy a very interesting property: they scale! Because (a+b+c+d) is the same thing as (a+b)+(c+d), you can send the first part (a+b) to a machine, and the second part (c+d) to another machine. The two operations can then execute in parallel, and their result can be merged by yet another machine 15. This is very close to the fundamental ideas beyond the mapreduce model, recently popularized by Google’s implementation, and as you can see, you don’t really need to rely on such specific technologies to reap the same benefits 16.

So Functional Programming scales, because it allows the programmer to scale. It brings a multitude of powerful abstractions, which not only avoids re-inventing the wheel, but allows a more “precise” reasoning over programs, its properties and their resulting quality.

Reason N. 4 They really scale!

The first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time.” – Tim Cargill

Software is more than just writing code. It’s debugging code! And there’s a root to all evil when it comes to debugging: mutability. I once spent hours of trying understand why a certain program ceased to function if I executed it step-by-step, but worked OK if I just jumped over the function. The problem? The debugger window was displaying the local variables and some good, well-intended soul, decided to override a getter to cache the result. And that specific getter function was crashing the application.

But there’s more… If we never know when objects will change, we can’t possibly know if removing an apparent superfluous call, or changing the order of execution, will output the same behaviour 17. We’ve all seen this taken to the extreme in dynamic languages such as JavaScript. Undefined functions and fields springing to life, because some previous call defined and re-defined them. A living hell to debug. 18

Functional Programming scales, because it gives better guarantees about the behaviour of your programs. They are easier to test, because they are more “isolated” in a certain sense. But much more important it empowers safer redesign and re-factoring; it makes program more resilient to change!

Reason N. 3 They scale even if you do nothing about it!

“A programming language is low level when its programs require attention to the irrelevant.” – Alan J. Perlis.

Now imagine if pieces of programs could be regarded as pure, i.e., having absolutely no side effects. What would this mean?

If parts of a program are pure, then one can rearrange the order of their execution. They won’t rely on any context, besides the parameters one passes to them. This key idea of FP is called “referential transparency”; functions do not behave differently the second time you look at them: you give the same inputs, you get the same result. A trivial idea with profound implications, because if there’s absolute guarantee that changing the order of execution will produce the same results, then one gains scalability for free, by, for example, executing different parts of the program in different threads or machines.

But there’s more to purity than inherent parallelization; things that really pure™ functional languages 19 take advantage of: laziness and memoization. Avoiding the risk to incur in a large technical debate over the advantages and disadvantages of such techniques, let us just state what they provide, and leave to the reader the decision over their suitability for the ad-tech industry.

Laziness and memoization could be translated to two obvious principles: “don’t compute something until you need it” and “don’t recompute something you have computed before”. Though trivial principles, the programmer usually has to be aware of them throughout the code; here’s a list, multiply the elements by two, give me the first one. Wait… why don’t I just take the first one, and multiply it by two?

And what about loading a webpage? Show me the first article (written in markdown); translate it to HTML. Show me again the first article; translate it again… A thousand of users requesting to view the first article; a thousand of translations of exactly the same content. Hmmm… Maybe we should once and for all translate the first article and store it somewhere? Let’s throw in a cache.

Suffice to say that FP – particularly due to the principle of “referential transparency” – provides the ability to tackle those concerns at the compiler level. A sufficiently advanced compiler can even achieve much of that for us… automatically; it may not throw in a cache, but it will easily memoize the results of a function whose parameters are always the same. FP scales even when you don’t make a deliberate effort for that.

Reason N. 2 They provide a new kind of glue

Such a catalogue of “advantages” is all very well, but one must not be surprised if outsiders don’t take it too seriously. It says a lot about what functional programming isn’t (it has no assignment, no side effects, no flow of control) but not much about what it is. The functional programmer sounds rather like a mediæval monk, denying himself the pleasures of life in the hope that it will make him virtuous.” – John Hughes

So far it seems that if we strip other languages from features such as mutability and control flow, then we would have the same benefits of functional programming, without the hassle of learning a new language. Hence, the question remains: why should we care to learn FP in the first place?

Flashback circa 1968. We are near Garmisch Germany, and NATO is holding the seminal conference on “Software Engineering”. So far everyone seems to agree on one particular conclusion: software is in crisis. Projects are running over-budget, over-time, and with low quality. They are inefficient, they don’t meet the requirements, their code is difficult to maintain. Some software isn’t even delivered… ever! They realized all this before Windows™ was presented to the world. 20 Forty years later, what have we learned? 21

From the several ideas there discussed, one of them looked at computer software as they looked at complex electronic systems: software should be componentized, i.e., built from prefabricated components 22. Powerful idea! In the subsequent years, we saw the appearance of Unix’s Pipes and Filters, Objective-C, OLE, COM and the concept of Frameworks in general.

On the object-oriented side, a similar race gave us the AbstractSingletonProxyFactoryBean… whatever that is. 23

The crux of modularization is a two-step process: (1) you first divide your problem into sub-problems, and then (2) you combine (glue) the results. You can look at this like an electronic circuit; you separate your problem (a computer) into sub-problems (CPU, Memory, GPU…) and then you provide the glue (copper circuits and wires that roughly match inputs with outputs).

What happens when you provide the wrong kind of glue, say, plastic instead of copper, or wire an input to another input? Best-case scenario: it doesn’t work. 24

Another (different) way to view modularization is like a big puzzle. Pieces have a shape and they only compose if their shape match. If done correctly, you are kept safe from combining components that don’t talk to each other, simply because their shape doesn’t allow it. This is akin to the experience given by (strongly-typed) functional languages. 25 Instead of declaring capabilities of objects, you define transformations that accept things of a particular shape. For example, to filter out elements of a container structure, you don’t really need a function for every kind of containers out there (Lists, Trees…) As long as the structure you’re providing has the shape of something that can be iterated over, then you simple iterate ruling out objects that don’t match your filter. 26

Functional programming provides new kinds of glue, such as higher-order functions, function composition, point-free definitions, type-classes and higher-kinded types. Some languages can even check these “glues” at compile time, making your program safer. You write less code, you reuse more, and it keeps you from reinventing the square wheel.

Again, consider the number of components an ad-tech platform now comprehends, yield managers, ad decision logic, RTB clients, forecast engines, logging, campaign optimizers, and you’ll understand why improving modularization (and making sure they fit with each other) is a crucial step in the industry.

Reason N. 1 Because it’s cool

“C is quirky, flawed, and an enormous success.” – Dennis M. Ritchie.

Much of the above was known more than twenty years ago, at a time where C was the de-facto king of programming languages. And yet, here we are, claiming again and again the advantages of functional programming. What really changed? Is imperative programming just getting old? Is Functional Programming becoming a hype?

The fact that C was a huge success is (arguably) a mystery 27. C is (was) well-known, widely adopted by both the industry and the academia, and hence enjoyed an incredible amount of scrutiny. A huge amount of C libraries and frameworks became promptly available in the wild. Yes, everyone knew C wasn’t perfect. In fact, C’s unsafe type system was an imminent disaster. Turing award laureate Tony Hoare once called the “null reference” 28 its billion-dollar mistake. And yet, even today, C ranks second in the Tiobe popularity index. Why?

Perhaps the keyword here is establishment. All those frameworks and libraries and tools built for C (in C) made it the number one choice. People didn’t even think for a second. If C is popular, and has everything I need, why choose another language?

Of course, the fallacy lies in the definition of “need”… Notwithstanding, there’s a bitter lesson here, learned the hard way by Haskell proponents: if you fail to provide the complete experience, it really doesn’t matter how good your product is.

So what changed?

We’ve now come full circle to the title of this article. Maybe some of you have been wondering about the usage of the prefix (post-) in post-functional. What is different today from what we had 20 years ago, is that the functional paradigm is no longer relegated to obscure, academic corners, whimsically forcing you out of your comfort zone.

Instead, what we are now observing is a merge between object-oriented and functional languages, at different degrees of realization. At the lowest level, we see the latest version of the C++ standard (0x11), incorporate the notion of lambdas and function objects. The same is true for Java 8, and was true for C# 2.0 and all its subsequent versions. And it’s not just higher-order functions. In Biancuzzi’s book “Masterminds of Programming”, we can read Anders Hejlsberg (lead C# developer) rant on how the future of the language lies in adopting FP techniques to cope with multi-core/parallel programming. We may even go as far as Microsoft’s release of F# as a first-class language for the “big-data” industry 29, that runs on top of the CLR.

And finally we have Scala. A language created by Prof. Martin Odersky, that fuses object-orientation and functional programming at their most fundamental level. Its secret? It runs on top of the Java Virtual Machine (JVM), and seamlessly interoperates with existing Java libraries and frameworks. It was specifically designed for scaling at large, and has enjoyed several successful adoption stories in the industry, including Twitter, LinkedIn, Foursquare, and The Guardian, just to name a few. “Functional Programming Principles in Scala30, an MOOC held by Coursera, had more that 50.000 students registered, becoming one of the most successful on-line courses. It’s not hard to conclude that, if these players are using Scala, maybe one should keep an eye on it.

So, post-functionalism is about realizing that the ecosystem matters. Instead of trying to get people out of its comfort zone, it brings those features into what people are comfortable with 31. Post-functionalism is about realizing that both object-oriented and functional programming can contribute to a new (merged) thing: OO is an excellent tool to design “components”, the outward limits of systems, while FP provides an invaluable tool to express its inwards. This thing is object-functional programming.

(Post-)Functionalism is now a reality. To some extent, it doesn’t really matter if you’ve been evangelized by this article. I’ve claimed that (post-)functional programming will give you better, more robust, more scalable, more modular programs. It would make you better cope with the specific needs of the fast growing industry that is the ad-tech, because the issues ad-tech has (high-availability, high-scalability, robustness, fault-tolerance, adaptiveness) are the issues that FP learned to solve decades ago. But, perhaps, the most pragmatic reason why post-functional programming matters is because you will no longer be able to ignore it.

  1. Where Knuth lift the status of Art to the mere act of reading his work.

  2. People seem to mistake rigorous with rigid. The former implies rules, while the later implies resistance to change.

  3. Before most readers run away upon gazing the words “formal”, please consider than I am a professor of Agile Methodologies, so bear with me for a while :-)

  4. I would argue we “shouldn’t”, unless we are hardcore researchers is those topics.

  5. And yet before another exodus en masse, do please consider that Java and C# are also part of my courses’ syllabus.

  6. Obeying rigorously doesn’t make the act rigorous.

  7. These reasons are no longer true; today’s compilers are much more efficient at producing assembly code than an human is, due to the magic of out-of-order execution, caches, pipelines, multi-cores and SIMD instructions.

  8. See this.

  9. Another key personality that made such bold assertions was turing laureate Alan Kay, the father of Object-Oriented programming, who once said he regarded LISP as the “Maxwell’s Equations of Software” (see this interview).

  10. There are countless many other reasons, not necessarily exclusive of FP, some of which are highly technical and out of scope of this article. For the curious, read about unified type systems, type inference, anonymous functions, closures, immutability, lazy evaluation, delimited continuations, higher-order functions, higher-kinded types, nested functions, currying, pattern matching, algebraic data types, tail recursion, partial functions, structural types, dependent types, explicitly typed self references, type bounds, variance, type annotations, type views, type classes, and follow the white rabbit…

  11. A mathematician and civil architect.

  12. Lacking something of value doesn’t make a thing worthless.

  13. If we also consider the inverse of addition (subtraction), then we have a group.

  14. Mathematicians may even argue that these are the true type of patterns.

  15. Another cool property (this one given by groups) is that there are some operations that cancel each other; if you sum and then subtract the same number, you come back to where you’ve started. In a more general sense, if your functions form a group, then you’ve just gained some optimisation techniques for free.

  16. Mapreduce is actually a very good lesson on how people are easily attached to labels. A few years ago, even before “big data” became the buzzword it is today, everyone in the ad space thinking on “big” data was talking about mapreduce. However, and despite its fundamental principles being mostly those of functional programming, many developers that loved mapreduce simultaneously loathed anything related to FP. Today’s hype in big-data is Hadoop. Go figures…

  17. There’s something very dysfunctional in the way dynamic OO programming deals with mutability. On one hand, the whole idea behind OO is that (inner) details don’t matter. A car is a car, and it has the functionality expected from a car. You “ask” the car to start, accelerate, break and stop. It’s a powerful concept particularly when coupled with abstraction; every vehicle should adhere to this interface of accelerating and breaking. What really spoils everything is that you never know when a Car will start behaving differently the second time you look at it. Now, what you can argue – and I concur – is that there’s something here associated with scale: big things (like cars or databases) are expected to change internally. But a clog is not. And when clogs suddenly transform into valves…

  18. Some people, when confronted with a problem, think: hey, I can solve this with reflection. Now they have two problems. As with every tool, meta-programming is not a free lunch.

  19. Purity, as with so many other qualities, comes in sizes.

  20. Perhaps not a fair joke, but I couldn’t resist :)

  21. Actually, we learned a lot; we now know that requirements are expected to change, and that gave us Agile Methodologies. We know that Moore’s Law is coming to an end, and that gave us Parallel Programming. Quality is important, and that gave us Test-Driven Development. We also know that a suitable algorithm is orders of magnitude better than “premature optimization”… But, we kinda keep forgetin’ about all that.

  22. Doug McIlroy, “Mass Produced Software Components” in Proceedings of Software Engineering Concepts and Techniques, 1969.

  23. A convenient proxy factory bean superclass for proxy factory beans that create only singletons.” (see here and here).

  24. Worst-case scenario, it explodes in your face.

  25. In nature, proteins also act in a similar fashion: they bind to other molecules due to their “shape”.

  26. This simple notion of filtering a collection is a nuisance in most imperative programming languages. Not only most of them can’t even return the same collection type (e.g., in C# you start with a List and end with an IEnumerable), but the filter implementation is repeated for every kind of container.

  27. A kind of mystery like the video-tape war.

  28. In fact, he was talking about Algol, to which C owes its roots.

  29. I’m deliberately incurring into an “appeal to authority” fallacy.

  30. See this.

  31. To be completely honest, functional-programming also comes riddled with mathematical jargon and category-theory concepts that may make your head explode. Your mileage may vary…

An Exploration of Genetic Algorithms in Scala

| Comments

Genetic algorithms are search heuristics that mimics the process of natural evolution, by adhering to nature-inspired techniques such as inheritance, mutation, selection, and crossover. In this post, we’ll see how to build a simple mechanism in Scala that would allow us to evolve specimens.

Let’s first start by defining our contract. Here’s some basic assumptions:

  • A genetic exploration evolves Specimens.
  • Specimens are made up from sequences of Genes.
  • There’s a finit number of possible genes, hence we’ll need a GenePool.
  • At any point, we should be able to calculate how fit a specimen is; i.e. a function from Specimen to Double.
  • There’s a probability of a single gene to mutate during the crossover.
  • There’s a maximum population (number of Specimens) to simulate.
  • It is possible to decide when to stop evolving our specimens (StopCondition).

The above can be represented by the following class signature:

class GeneticExploration[Gene, Specimen <% Iterable[Gene]]
    (val mutationRate: Double,
     val population: Int,
     genePool: Array[Gene],
     specimenBuilder: Iterable[Gene] => Specimen,
     fitnessF: Specimen => Double,
     stopCondition: List[Specimen] => Boolean)

What can we use as specimen? What about a String of characters? Strings are perfect candidates for specimens according to the above signature, since each indidivual character can act as a Gene (i.e. a String can be viewed as an Iterable[Char]). Here’s a specimen:

val target = "as armas e os baroes assinalados"

And here’s the corresponding gene pool:

val genePool = Array('a','b','c','d','e','f','g','h',
                     'y','z',' ')

Using String specimens makes calculating the fitness fairly straightforward:

def fitness(src: String): Double = src.zip(target).count { case (s, t) => s == t }

In other words, the fitness of a string is equal to the number of correct characters according to our target goal. Sometimes we’ll need new random genes from our gene pool. Let’s build an infinite stream of them:

def randomGenes: Stream[Gene] = genePool(Random.nextInt(genePool.length)) #:: randomGenes

… which makes generating a random new specimen also straightforward:

def newSpecimen(len: Int): Specimen = specimenBuilder(randomGenes.take(len))

It seems we have a lot of things in place. Let’s start by initializing a GeneticExploration class:

val petri = new GeneticExploration[Char, String](
    0.01, 500, genePool,           // rate of mutation, max population and gene pool
    cs => new String(cs.toArray),  // how to build a specimen from genes
    fitness,                       // the fitness function
    _.exists(_ == target)          // the stop condition

Where to now? We need to evolve specimens, but for that, we’ll need an initial pool of primordial specimens. Let’s capture the usage before implementing those functions:

val evolvedSpecimens = petri.evolution(petri.randomPool(archetype))

An archetype is simply a seed specimen from which the primordial soup is made of, as seen in the implementation of randomPool:

type Pool = List[Specimen]

def randomPool(archetype: Specimen): Pool =
  (1 to population).map(_ => newSpecimen(archetype.size)).toList

It is time to code the “main” function, which basically evolves a pool (by mating its specimens) until the stop condiction is met:

def evolution(pool: Pool, epoch: Int = 0): (Pool, Int) = {
    val newGeneration = popReproduction(matePool(pool))
    if (stopCondition(newGeneration)) (newGeneration, epoch)
    else evolution(newGeneration, epoch + 1)

Mating involves finding the specimens more fit to survive and leave descendency:

type MatePool = List[(Specimen, Double)]

def matePool(pool: Pool): MatePool = {
    val fitnesses = pool.map(fitnessF).toArray

With the typical renormalization (i.e., the sum of all fitnesses in the pool must be equal to 1):

def renormalize(vector: Array[Double]): Array[Double] = {
    val sum = vector.sum
    vector.map(_ / sum)

How do the population as an whole reproduce? Well, mates are selected from the mate pool using their fitness as the probability of being chosen:

def popReproduction(matePool: MatePool): Pool =
  (1 to population).par.map(_ =>
    crossover(monteCarlo(matePool), monteCarlo(matePool))).toList

A naive implementation of the Monte Carlo method could be coded as:

def monteCarlo[A](weightedList: List[(A, Double)]): A =
  weightedList(Random.nextInt(weightedList.length)) match {
     case (s, f) if f > Random.nextFloat => s
     case _ => monteCarlo(weightedList)

The crossing over of two specimens is here accomplished by randomly recombining their genes:

def crossover(a: Specimen, b: Specimen): Specimen =
  mutate(specimenBuilder(a.zip(b).map(gene =>
    if (Random.nextFloat >= 0.5) gene._1 else gene._2)))

… though sometimes mutations occur, i.e., random genes that were not inherited:

def mutate(s: Specimen): Specimen =
  specimenBuilder(s.map(gene =>
    if (mutationRate > Random.nextFloat) randomGenes.head else gene))

And that’s it! Go ahead and try… You can use the target as the archetype. See how many generations does it take for the algorithm to find out the very first phrase from Os Lusíadas.

Turtles All the Way Down (or Up?)

| Comments

The line separating between data and model is blurred when speaking about meta-data, in the sense that everything is, ultimately, data; only its purpose is different. For example, the information that some particular video in our system is named The Matrix, and another one named Lord of the Rings, is called data for the purpose of using it as an information system for video renting. We could hypothetically draw a line encompassing all the objects that account for data (normally called instances) and name it the meta-level-zero () of our system.

The way we would typically model such a simple system in an object-oriented language would be to create a class named Video, along with an attribute named Title. But what may be considered the model in one context, may be seen as data in another, e.g., for the compiler. As such, this information is meta-data in the sense that it is data about data itself: in fact, it conveys a very crucial information, which is the data’s structure (and meaning), for the purpose of specifying an executable program. Once again, we could draw a line around these things that represent information about other things — classes, properties, etc. — and call them meta-level-one (), or simply model.

But, what exactly is a class, or a property? What is the meaning of calling a method, or storing a value? As the reader might have guessed, once again, there is structure behind structure itself — an infrastructure — and the collection of such things is called meta-level-two (), or meta-model for short (i.e., a model that defines models), which is composed of meta-classes, class factories, and other similar artifacts.

Hence, when we talk about data (or instances) we are referring to — bare information that doesn’t provide structure. By model we are referring to — its information gives structure to data. By meta-model we are referring to — information used to define the infrastructure. And so on…

Ultimately, depending on the system’s purpose, we will reach a level which has no layer above. This “top-most” level doesn’t (yet) have a name; in MOF it is called a meta-meta-model, due to being the third model layer 1. This building up of levels (or layers), where each one is directly accountable for providing structure and meaning to the layer below is known as the Reflective Tower, a visual metaphor that can be observed in the following figure:


Figure 1. The meta-class of the class class is the meta-class class. Or in other words, classes’ meta-classes are classes too.

All this would not be very useful if it did not had a purpose. I already mentioned the compiler, whose task is to read a particular kind of information (known as source code) and translate it into a set of structures and instructions (known as a program), which would later be executed by a computer — a process known as compilation. The compiler acts as a processing machine: the input goes into one side, and the outcome comes from the other. Once the compiler has done its job, it is no longer required, and so it does not observe nor interact with the final program. Should we wish to modify the final program, we would need to change the source code and handle it again to the compiler.

Now let us suppose we wanted to add a new property to a Video, like the name of its Director, or create new sub-types of videos as needed, like Documentary or TV Series, each one with different properties and relations? In other words, what if we need to adapt the program as it is running? For that, we would need both to observe and interact with our running application, modifying its structure on-the-fly (the technical term is during run-time). The property of systems that allow such thing to be performed is called reflection, i.e., the ability of a program to manipulate as data something representing the state of the program during its own execution. The two mentioned aspects of such manipulation, observation and interaction, are respectively known as introspection, i.e., to observe and reason about its own state, and intercession, i.e., to modify its own execution state (structure) or alter its own interpretation or meaning (semantics).

The technique of using programs to manipulate other programs, or the running program itself, is known as meta-programming, and the high-level design of such system is called a meta-architecture.

  1. Would it be the sixth, I seriously doubt anyone would apply the same prefix five times.

Some Words on the Espistemology of Patterns

| Comments

The opening of third book on Pattern Languages of Program Design starts with the following sentence: “What’s new here is that there’s nothing new here”.

This single assertion seems to characterize the epistemological nature of patterns, in what concerns its methodology and goals; patterns result from the observation, analysis and formalization of empirical knowledge in search for stronger invariants, allowing rational design choices and uncovering newer abstractions. A pattern should not report on surface properties but rather capture hidden structure at a suitably general level. A comprehensive discussion on the epistemology of patterns and pattern languages can be found in a recent work by Kohls and Panke:

The argument that there is “nothing new” in a pattern must be rejected; otherwise there would be nothing new to physics either, since physical objects and the laws of physics have been around before, just as design objects or programming styles have been around before somebody sets up a pattern language or collection. The discovery and description of a new species is without question considered scientific progress. Of course, the animals are not new – they lived there before – but they are newly discovered. In the same way, “the most important patterns capture important structures, practices, and techniques that are key competencies in a given field, but which are not yet widely known”.

Preliminary Thoughts on Software Design - Part I

| Comments

I see the word design being loosely used almost every day. People say a particular framework has a good or bad design. They discuss “architectural design1. They even pay thousands for an expert to design their innovative software solution.

But, what exactly is design? What makes the difference between good and bad design? How do we measure it? Why is it a key factor in the development of sophisticated computer systems?

The Webster’s dictionary defines it as the act of (i) “working out the form of something”, (ii) “sketching a plan for something”, (iii) “creating something in the mind”, or (iv) “doing something for a specific role or purpose or effect”. And it also seems that Software Engineering has also been faced with the task of sketching, planning, and creating suitable products for specific purposes.

But Software Engineering, aspiring to become a quantitative science, goes beyond mere sketches. Software solutions drafted during the design of a software artifact do consider forces such as (i) the experience of the team, (ii) the available budget, (iii) technical constraints, etc. In fact, software projects were historically recognized as having four major forces through which any particular balance (or imbalance) of them directly influences the viability of a specific solution:


Figure 1. Here, (1) Time refers to the amount of time available to complete a project; (2) Cost refers to the budgeted amount available; (3) Scope refers the project’s domain limit and detail. Each force mutually influences (either positively or negatively) every other. Consequently, (4) Quality is regarded as a result function of how the other forces are balanced.

But even taking into consideration these four major forces, the ever increasing complexity (both inherent and accidental) of building and managing software artifacts are pushing the limits of creation beyond the ability of a single entity. And similarly to every other areas of engineering, it is not uncommon for a single problem to have multiple ways to be coped with.

So how does an engineer choose the best design for a specific function?

As knowledge grows in a specific area, solutions are captured and documented by experts in books, research papers, concrete implementations, web pages, and a myriad of other types of communication media.

While we may intuitively think that any growth in the body of knowledge implies better design choices, it seems that the way (and the amount) this knowledge is being captured raises an important issue per-se. As pointed out by Schwartz 2, the overabundance of choice that our modern society has poses a cause for psychological distress, where the information overflow ultimately results on an over-simplification of criteria.

Actually, preceding Schwartz in four decades, Christopher Alexander claimed 3 that most information on any specific body of knowledge is “hard to handle, widespread, diffuse, unorganized, and ever-changing”. Moreover, this “amount of information is increasingly growing beyond the reach of single designers”. He concluded that “although a specific solution should reflect all the known facts relevant to a specific design (…) the average designer scans whatever information he happens on and introduces it randomly”. 4

Design, as an intentional act of choice, is constantly overwhelmed by the sheer quantity of available information. So, in the end, how does an engineer actually chooses the best design for any specific function?

  1. Whatever that is…

  2. The Paradox of Choice: Why More is Less

  3. Notes on the Synthesis of Form

  4. I don’t know about you, but realizing that most design choices are akin to a random walk isn’t very flattering to our profession.

All Gambozinos Are White: Part III

| Comments

Abstract. It is (finally) argued that there may be a case where all gambozinos are white.

To Induction or not Induction

We have argued that a consistent logic cannot support inductive conclusions based on the observation of instances, so how is mathematical induction reconciliated with logic? The trick is very simple; mathematical induction on natural numbers is actually a form of deductive reasoning, as shown in the following second-order clause:

…where is any proposition, , and are natural numbers, and assumes the lowest value for which holds (usually or ). Remember Socrates and the mortality of men? The concept here is very similar. One asserts that, for every proposition , if an individual has a certain property, the next individual also has that property. Since we know (by observation) that the first individual in a series has the property, then it follows that every individual has that same property. The universal statement is a consequence of the established premises, and not a generalization based on individual case analysis. QED.

All Gambozinos are White

This form of induction does not necessarily involve numbers; one can actually generalize it to any type of well-founded structure, i.e., any structure whose elements relate to each other in a finite number of ways, essentially creating a “chain”. Back to gambozinos, imagine that you are able to assert that (i) if a gambozino is white, its descendants will always be white, and (ii) the first two gambozinos to exist were white 1. Then, by the nature of the structure that rules the gambozino ascendency, all gambozinos are proven to be white 2.


Mathematical induction is a powerful tool in deductive reasoning that allows to prove properties of an infinite number of elements without having to actually observe every one of them. It works whenever the elements we are dealing with are part of a well-founded relation, and we are able to assume properties over that relation. Mathematical induction is thus well beyond inductive reasoning, able to assert the veracity of an argument over its mere plausibility.

  1. There are actually premises that we’ve disregarded for the sake of simplicity, such as (iii) except for the first two of them, a gambozino can only exist through sexual reproduction, and (iv) the parenthood of a gambozino is an anti-simetric, anti-reflexive and anti-transitive relation.

  2. Unless genetic manipulation is allowed, but then you would be attacking the premise, not the conclusion.

All Gambozinos Are White: Part II

| Comments

Abstract. It is (still) argued that there may be a case where all gambozinos are white.

There are Infinitely Many Primes

In inductive reasoning, the premises do not guarantee the conclusion, although they may give it some probability or plausibility. In order to prove an universal claim one have to observe every instance of that claim, or else assume it as a (potentially refutable) hypothesis.

But, mathematicians keep proving things about numbers without actually observing every one of them. For example, while it is only believed that every even number is a sum of two primes 1, it is actually known that there is an infinite number of primes. So, if it is true that mathematical induction involves a sort of generalization, how can we ensure its validity within a logical framework?

Mathematical induction is actually a very different type of reasoning, and the art goes as back as 2000 years. Euclid is recognized as probably the first one to have implicitly used it for proving that there are infinitely many primes. The reasoning is similar to the following (from Wikipedia): suppose you were searching for prime numbers, and you had already collected a very fine list of them, . Let be the product of all the prime numbers in the list, . Let . Then, is either prime or not: (i) if is prime then there is at least one more prime than is listed, and (ii) if is not prime then some prime factor divides . This factor is not on our list: if it were, then it would divide (since is the product of every number on the list); but as we know, divides . Then would have to divide the difference of the two numbers, which is . But no prime number divides so there would be a contradiction, and therefore cannot be on the list. This means at least one more prime number exists beyond those in the list. QED.

Mathematical Induction

The above proof is based on a very particular type of structure inherent to natural numbers, and it is precisely that structure that allows us to prove something for every number, despite there are infinitely many of them. Let us delve a little bit more on how a proof by induction works before coming back to logic. Suppose you want to prove that the following statement holds for all natural numbers :

The proof consists of two distinct (but intertwined) steps: first, we show that the statement holds when is equal to the lowest value that is given in the original statement, that is when :

Then we need to show that, if the statement holds for some , then the statement also holds in the subsequent of n, i.e. when is replaced by :

Using equation 1, we can rewrite the left-hand side, so all that remains is to (algebraically) prove the equality:

which is trivial. Therefore (1) holds. QED.

The Falling of Dominoes

The previous application of induction is based on the fact that every natural number is connected to every other by a known rule: summing. In fact, if you take the lowest of the natural numbers 0, and keep adding 1 to the result, you will eventually reach every natural number that it exists. Therefore if you prove that (i) if any arbitrary number has a property , then must also have that property, and (ii) the lowest of those numbers has , then it follows that every number has that property.

And here lies the slight of hand! Mathematical induction is similar to the sequential effect of falling dominoes. Put every one of them in a line, and prove that, if an arbitrary domino falls, the one next to him must fall 2. Then push the first one and, voilá: every one of them falls.

  1. Actually, the Goldbach conjecture is a fine example of how mathematical induction is different from simple induction. It states that every even integer greater than 2 can be written as the sum of two primes, e.g. 10 = 7 + 3. But despite no even number ever found violates this rule, the conjecture remains mathematically unproven.

  2. This may seem tricky, but you could assume some form of consistent newtonian physics, and a fixed distance between pieces.

A Pattern Language for Hopeless Argumentation

| Comments


  • It’s in the Bible. Claiming truth by asserting expertise from an outside source; aka Appeal to Authority.
  • Why Not? Changing the burden of proof to the opponent; aka Inversion of Responsibility
  • Moving the Goalposts. Keep raising the bar of the opponent arguments, until it fails; aka Demand of Impossible Perfection.
  • They Will Never Take our Freedom. Appealing to common sense or wise sayings.
  • I’m gonna Cry. Making the audience empathise or otherwise identify with yourself through shown emotions; aka Appeal to Pity or Sympathy.
  • This is Sparta! Use of an elequent and assertive discourse, targeting the audience emotions; aka Poetic Language.
  • Void. Asserting that the absence of evidence is the evindence of absence.
  • It’s the End of the World. Claiming that either the argument is true, or bad things would happen; aka Adverse Consequences.
  • Shaken, not stirred. Use of charismatic attributes to convince the audience; aka Personal Charm.
  • About those Ants… Deflecting the question by subtly changing the focus to another problem during the discourse.
  • No Gray Area. Implying there are only two alternatives, when in fact there are more; aka Excluded Middle.
  • Fast Talking. Speeding up the pace of speech such that it becomes hard to think and follow.
  • Buzzword Contest. Using an overwhelming ammount of technical terms, intended to confuse or look like an expert in the field; aka Prestigious Jargon, Argumentum Verbosium.
  • Pain in the Ass. Sistematically repeating the same thing, until it becomes an accepted truth; aka Ad Nauseaum.
  • Are we there yet? Keep making questions to disrupt the opponent thought or dodge the burden of proof; aka Argument by Question.

The Shape of Software

| Comments


Software may be regarded as the crystallization of an abstraction that models a specific domain. Ideally, it should match the exact limits of that domain. But in practice (i) those limits are fuzzy, (ii) software often imposes an artificial, unnaturally rigid structure, and (iii) reality itself keeps changing.

All Gambozinos Are White: Part I

| Comments

Abstract. It is argued that there may be a case where all gambozinos are white.


Most of us learn that there are two types of reasoning, namely deductive and inductive. In deductive reasoning, one usually starts from a general observation (a set of premises) and arguments towards a specific conclusion. For example, from the classic statement all men are mortal, along with the observation that Socrates is a man, it follows that Socrates is mortal. In a more abstract sense, we are asserting that, should a set of things have a certain property 1, and should something belong to that set, then it must have that same property, i.e. (i) , (ii) SocratesSocrates.

Inductive reasoning, on the other hand, makes generalizations based on individual instances. Imagine that you go outside in a quest to observe the gambozino, an animal so rare that it only appears minutes before the sunset in a rainy day. You were lucky to catch three young, white gambozinos, drinking water by the lake. Since it is the first time you see one, you assume that, probably, most gambozinos are white. In the following days you repeat the feat, and you now seem to be confident that all gambozinos are white. Hence, (i) , (ii) , (iii) , (iv) … ?

Of course it doesn’t seem valid to assume that, only because you have seen a dozen of gambozinos, all must have the same characteristics. But what if you had observed hundreds, or even millions? Is it sound to begin an argument based on a probability (most), and conclude an universal assertion (all), due to the sheer number of observations?

Proving an Universal

It was the skeptic Sextus Empiricus who first questioned the validity of inductive reasoning, by positing that a universal rule could not be established from an incomplete set of particular instances:

When they propose to establish the universal from the particulars by means of induction, they will effect this by a review of either all or some of the particulars. But if they review some, the induction will be insecure, since some of the particulars omitted in the induction may contravene the universal; while if they are to review all, they will be toiling at the impossible, since the particulars are infinite and indefinite.

According to Sextus Empiricus, only if one had observed all gambozinos could one conclude the universal statement, since even just a single particular would be sufficient to disprove the generalization. Therefore, (i) , (ii) .

Monotonicity of Entailment

There are other different ways to show that inductive reasoning isn’t a valid form of argumentation. For example, let us go back to Socrates and the mortality of men. Suppose Socrates is observed to never die 2. We should then reject the conclusion, not because of its form, but because of its premises; either Socrates isn’t a man, or not all men are mortal. Should we still accept the premises, then Socrates will (someday) die, by the sheer force of our logic.

Now suppose that we observe more men that don’t die. We may add this fact to our list of premises, but we now seem to be in a position of inconsistency. Some men die, others don’t, and thus being a man isn’t sufficient to guarantee its mortality: . We can’t even begin to reject the conclusion since our premises are contradictory. Otherwise, no matter how many new premises you add, the conclusion is always a direct consequence of the hypothesis.

This characteristic, i.e., that in a consistent argumentation one may add premises without affecting the validity of the conclusion, is called the monotonicity principle, and one can see that the inductive reasoning violates it: adding a black gambozino rejects the previous conclusion, so (i) , (ii) , (iii) … .

  1. This is formally equivalent to , but we are being relaxed.

  2. We may have a practical problem with this premise since, to observe that Socrates never dies, one would have to (i) wait an infinite amount of time, and (ii) be immortal.

How to Write an Academic Report

| Comments

Here are some tips for writing an undergraduate academic report:

Don’t State That Your Particular Work Is Above Average

The goal of evaluation is to evaluate. Some students may think they’ve surpassed their own expectations (and it’s highly encouraged a sense of self-criticism), but the ultimate responsibility for the grade is from the lecturer. Thus, when writing the conclusions chapter, focus on presenting the challenges you’ve had and if there is room for improvements. A student that doesn’t know what else to do with its own work is a student without vision. Nothing is perfect. Excellence does not require perfection; it requires self-awareness, though.

Motivation Is About The Significance Of Your Work

The study of the “Implications of Formal Methods in Systems with Unstable Requirements” or “Agile Methodologies for Safety-Critical Systems” may be motivational because there seems to exist apparent, intrinsic antagonies. Implementing something in Piet may be motivational due to the exquisite underlying language. These affirmations show more about why the student chose its work than something like “I did this because I always enjoyed riding bikes”: it shows that the student cared to analyze the significance of its work to the underlying course. Which, by the way, leads to…

Your Work Has A Context

If you’re taking a course on dogs, and you’re asked to make a work about cats, it’ll probably be wise to compare dogs to cats. It would be unwise to try to meow your way through it. In other words:


If the course is about Object-Orientation, and you ignore Inheritance, Polymorphism, Decoupling, Encapsulation and Abstraction, then get yourself ready for a low grade, even if it just works.

Don’t Waste Time On Prettying-up Reports

If your teacher doesn’t give you a pre-defined format, than go grab one of the standards out there: ACM, IEEE, LNCS… Unless you’re taking a course on typography or design, trust me: the font you’ll choose is hideous. Make use of graphics and images if they’re relevant. Aesthetic distractions waste your time, don’t give you extra credits, and will not replace useful content. But if you insist on designing your own report then…

If You Must, Read Something About Typography

Use sans-serif fonts for titles, and serif fonts for body with enough spacing. A combination of Arial/Georgia using 10/12 for text size is good enough. If you really care on making your report as aesthetically pleasant as possible, this book is for you.

Form Follows Function

Instead of putting your effort in the aesthetics of the report, apply it in the elegance of the solution. Do your work with pride, as if everyone would be looking at it. After some experience, one knows when something was made in a hurry - we all were once students, you know?

Use Collaborative Tools

If you’re technological savvy - as in a 2nd year CS/SE student - use a version control system, or a wiki, or a combination of both. If you aren’t, Google Docs is here for you. This is specially relevant within big groups: not only it’ll ease cooperation, but will provide keep an history of your work for later review - though I believe some don’t like that prospect ;-)

It’s Not The Goal, But The Path

The goal of the course is seldom to develop the best product in the market, or to fulfill 100% of the requirements - which by the way, unless stated, tend to be guidelines. The goal is to learn - not memorize - the underlying concepts. You should be able to, after several months, easily recall with small effort and then infer your way throughout the subject like a pro.

This said, claims such as “we’ve fulfilled all of the proposed requirements so we deserve a 20/20” are, at best, silly! In contrast to final exams, evaluation of essays have into account parameters like overall work complexity, elegance of the solution, and potential scientific relevance. Essays shouldn’t be regarded as functional black-boxes; the how is as relevant (if not more) than just being done.

Learn By Trial And Error, Reductio Ad Absurdum

If something doesn’t work, don’t rush calling your teacher or giving up. Read your error messages, try slight changes, learn to debug. Citing Guido van Rossum - “you are debugging your entire life”. If all else fails, start again from a consistent point - the use of version control systems starts to pay off here. If the problem reveals to be non-trivial, then discuss it in your report: extra credits for those that surpass difficult challenges!

Cite Your Sources

Original research is only required for a PhD. And even at that point, ideas don’t usually appear out of nowhere. Research is typically an iterative, incremental process, introducing small changes to an already existing body of knowledge.

Having said that, cite your sources. And if you don’t have sources, then learn to find them… à-priori! Learn to use Google and Google Scholar.

Also, unless stated, it’s not wrong to reuse - au contraire, it may reveal the pragmatism you’ll need in the industrial world. But, by all means, cite them!!! Credit where credit is due. And if you don’t like the culture of copy-paste - which you shouldn’t - then use your newly gained time to improve upon it.

Still, Google is not a valid source per se. Point to the actual pages you’ve used, and when appropriate, use those references inside the text of your report. Otherwise, it’s just a bunch of links.

And Finally

Writing, no matter what, is an art that should be practiced. Its ultimate goal is to act as a medium for communication. Thus, don’t disregard grammar, be objective, and always assume the target audience can’t read your mind.

And So It (re-)Begins…

| Comments

It was back in 2011 the last time I took the trouble of updating my old wordpress blog. I was just finishing my PhD thesis and so the last post was something like “Yay, I finished it”.

But, looking back, I never really took the trouble of setting up a real programmer’s blog. So, allow me to try to change that :-) Here’s a true Hacker’s blog done on top of Octopress and hosted at Github. Let the posts start rolling…