Elements of a good architecture

I was working with some legacy code lately where commercial goals where given precedence over technical quality. All rules on writing solid, maintainable software where broken. The situation was terrible, and with no future sign of improvement. Architecture just wasn't part of the culture and had no real value in the perception of the technical lead and the commercial department. I was working with this code for over 2 weeks, and just could not continue working with it.

I had to give the advise to either start over from scratch and do it right, or continue with a cycle of bugfixes and "hoping" everything would keep on running without me. The quote below adequately describes the state of their software.

A Big Ball of Mud is a haphazardly structured, sprawling, sloppy, duct-tape-and-baling-wire, spaghetti-code jungle. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems.

It resulted in me and another healthy motivated programmer leaving the team.

We belong to the later category: "Programmers with a shred of architectural sensibility shun these quagmires."

Software design is like a game of chess

Software architecture/design is like a game of chess. You have many options but some just give your position more options than other positions. You could argue that the adaptability of you position is an indicator of quality. Higher adaptability is higher quality. Since the probability you can adapt to the attacks of your opponent increases.

Good, solid and robust software also share these characteristics. In this article I would like to address cohesion and coupling, and show you they (should) underly many architectural choices.

Cohesion is often contrasted with coupling, a different concept. Nonetheless high cohesion often correlates with loose coupling, and vice versa. The software quality metrics of coupling and cohesion were invented by Larry Constantine based on characteristics of "good" programming practices that reduced maintenance and modification costs.

So we could state that any choice that increases cohesion and decreases coupling in a certain domain would be a good architectural choice.

I believe this is correct from a purely technical point of view. There is also the human (cognitive) point of view. Good programs tend to have their naming down. Reading this software is easy, the whole structure adapts naturally to the domain and to the vectors of change.

There are only two hard things in Computer Science: cache invalidation and naming things. Phil Karlton

The vectors of change is an important element, some vectors of change are easy to spot and others are difficult to spot. It's a predicting game, and is heavily dependent on the perception of the architect and the domain expert at the time the design is constructed.

Too much anticipation on change may lead to Inner Platform anti-pattern. Writing a language a language.

Extract knowledge from the domain expert

The knowledge of the domain-expert is captured in the language he/she uses to express this knowledge. As a good architect, one of the more important skills is knowing how to listen, and extract the right knowledge and patterns so you can model the domain correctly in software.

If it's modelled correctly the probability the system can adapt correctly to anticipated change increases and the software communicates the knowledge of the domain. The domain expert should be able to read the domain level code and verify it's correctness.

What is the "best" approach

Following these assumptions you can conclude there is no such thing as "one superior design methodology", good software design is partly science and partly art.

Lets look at some popular architectural choices made these days.

Rails architecture

Rails is a great platform for quickly realising a prototype. You can create models, attach validation to these models, and render forms using the models meta-data and more. This is all great for initial speed, but the whole idea that form validation (view) information comes directly out of the model (persistence) ties the view to the model. Something I would rather avoid.

The design of Rails rewards this behaviour (speed). Behaviour which finally will end the creation of yet another big ball of mud, especially when applications grow >50k lines of code.

Using Rails as an architectural choice for all domains, is essentially saying MVC is a solution for all problems.

DCI could be a solution in some domains to keep everything clean and simple within a MVC structure.

Data, Context and Interaction to supplement MVC

The paradigm separates the domain model (data) from use cases (context) and roles that objects play (interaction). DCI is complementary to model–view–controller (MVC). MVC as a pattern language is still used to separate the data and its processing from presentation.

Why would we want this?

To improve the readability of object-oriented code by giving system behavior first-class status
To cleanly separate code for rapidly changing system behavior (what the system does) from code for slowly changing domain knowledge (what the system is), instead of combining both in one class interface
To help software developers reason about system-level state and behavior instead of only object state and behavior
To support an object style of thinking that is close to peoples' mental models, rather than the class style of thinking that overshadowed object thinking early in the history of object-oriented programming languages

So what is it ...

Context makes sure the data stays decoupled by providing an extra layer on top of the interaction. Introducing the role on the data and directing the interaction of all the components

Essentially it boils down to this:

1) Data(objects) are simple, they only contain functionality regarding persistence. 2) Roles describe a certain behaviour of data object. 3) The Context loads the different DataObjects, injects certain roles on them and directs the interaction between them.

So how does this tie into the cohesion and coupling paradigm.

Data are kept slim, and no behaviour is specified on them. Since behaviour is specified in Roles it is decoupled from the Context.

Context
initialize
   # instantizes all dataobjects and makes sure they can perform
   # the requested roles

call/execute
   # perform the sequential logic using the dataobjects

So by this we could say a Context is coupled to (one or more) Roles. And a Role is coupled to a DataObject.

Context --> *Roles --> *DataObject

For more information read: Data context and interaction

Ruby community heuristics (and the relation to coupling and cohesion)

Spending a lot of time in the Ruby community lately. I hear the following a lot from experienced Ruby programmers: Tell don't ask, Law of Demeter, Single Repsonsibility Principle

It's my belief that all these elements can be traced back to the root ideas of cohesion and coupling.

Tell don't ask

Reduces data coupling, especially 2 way coupling. If object (A) request data form (B) both object get tightly coupled. There is 2 way communication. Tying both A -> B, and B -> A.

Law of Demeter

Reduces data coupling. If (A) accesses a specific data-structure of (B) one cannot freely change the internals of B since A will have to be changed as well. Keeping knowledge internal.

Single Responsibility Principle

Increases cohesion, always a good thing.

Cohesion and Coupling

So cohesion and coupling should be a dominant factor on which you should make architectural choices.

For an overview of the different forms of cohesion I have grabbed their definitions from Wikipedia. Maybe an interesting read.

Cohesion

Modules with high cohesion tend to be preferable because high cohesion is associated with several desirable traits of software including robustness, reliability, re-usability, and understandability whereas low cohesion is associated with undesirable traits such as being difficult to maintain, difficult to test, difficult to reuse, and even difficult to understand.

Coincidental cohesion (worst)

Coincidental cohesion is when parts of a module are grouped arbitrarily; the only relationship between the parts is that they have been grouped together (e.g. a "Utilities" class).

Logical cohesion

Logical cohesion is when parts of a module are grouped because they logically are categorized to do the same thing, even if they are different by nature (e.g. grouping all mouse and keyboard input handling routines).

Temporal cohesion

Temporal cohesion is when parts of a module are grouped by when they are processed - the parts are processed at a particular time in program execution (e.g. a function which is called after catching an exception which closes open files, creates an error log, and notifies the user).

Procedural cohesion

Procedural cohesion is when parts of a module are grouped because they always follow a certain sequence of execution (e.g. a function which checks file permissions and then opens the file).

Communicational cohesion

Communicational cohesion is when parts of a module are grouped because they operate on the same data (e.g. a module which operates on the same record of information).

Sequential cohesion

Sequential cohesion is when parts of a module are grouped because the output from one part is the input to another part like an assembly line (e.g. a function which reads data from a file and processes the data).

Functional cohesion (best)

Functional cohesion is when parts of a module are grouped because they all contribute to a single well-defined task of the module (e.g. tokenizing a string of XML).

Coupling

Coupling can be "low" (also "loose" and "weak") or "high" (also "tight" and "strong"). Some types of coupling, in order of highest to lowest coupling, are as follows:

Content coupling (high)

Content coupling (also known as Pathological coupling) is when one module modifies or relies on the internal workings of another module (e.g., accessing local data of another module). Therefore changing the way the second module produces data (location, type, timing) will lead to changing the dependent module.

Common coupling

Common coupling (also known as Global coupling) is when two modules share the same global data (e.g., a global variable). Changing the shared resource implies changing all the modules using it.

External coupling

External coupling occurs when two modules share an externally imposed data format, communication protocol, or device interface.This is basically related to the communication to external tools and devices.

Control coupling

Control coupling is one module controlling the flow of another, by passing it information on what to do (e.g., passing a what-to-do flag).

Stamp coupling (Data-structured coupling)

Stamp coupling is when modules share a composite data structure and use only a part of it, possibly a different part (e.g., passing a whole record to a function that only needs one field of it). This may lead to changing the way a module reads a record because a field that the module doesn't need has been modified.

Data coupling

Data coupling is when modules share data through, for example, parameters. Each datum is an elementary piece, and these are the only data shared (e.g., passing an integer to a function that computes a square root).

Message coupling (low)

This is the loosest type of coupling. It can be achieved by state decentralization (as in objects) and component communication is done via parameters or message passing (see Message passing).

No coupling

Modules do not communicate at all with one another.

Object-oriented programming

Subclass Coupling

Describes the relationship between a child and its parent. The child is connected to its parent, but the parent isn't connected to the child.

Temporal coupling

When two actions are bundled together into one module just because they happen to occur at the same time.