Thursday, December 14, 2006

Pragmatic Software Design

Is there such a thing as minimal design? In my experience there is good design, bad design or no design at all. The easy throw-away line that minimal design and coding is the way to go just does not wash with any rational test. The problem is how to identify and eliminate poorly designed bits of code while recognising and keeping well-designed pieces of code.

How do we identify bad code? The easiest way is to follow a set of rules and match the code against those rules. Design antipatterns are a good place to start. Martin Fowler and friends in Refactoring speak of code smells and have a taxonomic breakdown of the smells that emanate from bad code and the steps to be taken to rectify the problem.

Can we recognise good code? If we start with design patterns, enterprise patterns, language-oriented idioms and set the goal of explicitly naming the patterns and idioms in our code then we have a metric of pattern density, or as I like to call it, the emergence of pattern poetry.

The emergence of structure from goal-seeking behaviour among autonomous agents is a characteristic of all distributed systems including software developers. Some might call it, with apologies to John Vlissides, pattern hatching.

What can be done to rectify poor code? The baby steps of refactoring are a good starting point for improving code structure in the small. The detail of implementation can be directly improved by making incremental changes to the code. In order to ensure the continuation of correctness in the behaviour of the code it is necessary to have unit tests that exercise each unit of code and confirms its invariants are satisfied as expected.

Test-Driven Development in concert with Mock Objects for unit testing help to ensure that the expected behaviour of each unit can be modelled, tested and implemented. One of the corollary conditions is that the contract for each unit is expressed as an interface or abstract base class so that production and mock implementations can be provided for units to test against.

Unit testing, glass box or white box testing is concerned with assuring the internal behaviour of each component is correct. Integration testing or black box testing is distinguished as concerned with validating the external behaviour of various integrated sets of components against the expectations of the customer. Acceptance tests and integration tests are synonyms for tests that are specified by the customer or are derived from the user requirements, otherwise known as the system definition.

It is perfectly clear to most developers that unit testing is sufficient to ensure that components will successfully integrate and interoperate together in order to fulfil the purpose for which they were designed. Usually, each component has been assigned a number of specific functional requirements, use cases or user stories during requirements analysis and negotiation, iteration planning or the planning game. The problem is this crystal clear conclusion is completely wrong.

A number of units of code cannot just be stitched together and expected to work as anticipated. Each component or unit has its own invariants, constraints and boundary conditions. Some of these match globally across all of the other components while other are distinct constraints. The picture is one of trying to stitch together a large area of fabric from different sized and shaped pieces that each have their own size, shape, material and thickness.

For the mathematically minded, the situation is analogous to a analytic function in complex space in the sense of satisfying the Cauchy-Riemann equations for the existence of higher-order derivatives and so demonstrating smooth continuity.

Most system partitionings into subsystems and components result in units of code that are analogous to distinct regions of complex space where each one of the functions may be locally analytic. However there are mathematical difficulties at the boundaries of each of these regions trying to stitch the functions together.

The problem cannot be remedied by small, incremental, baby steps to refactor the code. Fowler's large refactorings cannot be done in baby staps at all. Continuous integration that requires code to compile against all the most recent versions in the source repository of other components is not possible.

Such a backward-looking constraint causes the design of the code to remain in a local minimum in the phase space of design possibilities with little or no chance or breaking out into a quality design.

Evidence for progress is a continually breaking build because an advance or change is made to one or more component interfaces causing compilation failure, or change in one or more component implementations causing unit or integration test failures.

The need for unit and integration tests is paramount for the possibility of successfully integrating the diverse components. And for interface contracts to allow the unit and integration tests to be coded in the first place using mock objects.

The result is confidence in the code, the unit tests and validation of acceptance and integration tests. This is a pragmatic approach and there is nothing minimal about it.

Sunday, December 10, 2006

Modern Object-Oriented Programming

The most popular object-oriented programming (OOP) languages of C++, Java and C# are very similar to each other and indeed to their progenitors of Smalltalk, Simula, Algol and C. Similarly, the programming skills employed are similar regardless of the development language that has been chosen. Each of the modern languages shares structured programming constructs like if-then-else, for-next loops, or equivalents, and so on, with each other and their older cousins. With each other they share constructs that can be used to build classes, employ polymorphism and specify type safety within the bounds of the domain-specific language (DSL) that extends the base language for the problem domain in which we are working.

That there are so many commonalities between the three modern languages is unsurprising since C++ was the first programming language to cohesively tie together object-oriented concepts and Java copied, sometimes insensitively, from C++. Without doubt C# is a copy of Java with a few extensions along the lines of syntactic sugar (eg. properties and indexers, perhaps itself a workaround for the absence of operator overloading). Even more telling is the fact that Java and C# betray their own origins by adopting the very features that they initially dropped from C++ and had previously been decried by the language designers as bad features. The features that were dropped from Java when otherwise copying C++ syntax include multiple inheritance, operator overloading and templates, otherwise known as generics. C# included operator overloading in its first incarnation as a throw-back to its supposed C++ lineage but copies the Java path, perhaps set by Smalltalk, in disallowing multiple inheritance and generics. These and other features have crept their way into Java and C# however it is arguable whether programmers are more productive, not withstanding the marketting of the respective sponsor organisations for each language.

The exploration of the details behind the key constructs of C++, the equivalent code in Java and C#, the extensions and omissions in those languages, and the revised C++ standard, i.e. C++0x and Technical Report 1 (TR1), following over a decade of stability, are beyond the scope of the present discussion. The points of note are that the languages are very similar and appear, at first glance, then second and third blush, to borrow insensitively from C++, clearly their motivation. Java differs quite markedly in some syntax while maintaining the same class flavour while the more-recent C# seems, to invoke evolutionary atavism, to be a throwback to C++ with a few enriching extensions. All three are, for all intents and purposes, equivalent for purpose of business programming, among others, and being general-purpose programming languages any one can in general implement with relative ease the same solution as in any of the other two.

The building blocks of complex software are groups of components that number several classes grouped together into a library, jar file or assembly. The opposing forces of decoupling and cohesion determine which classes and components to group together into a single unit of release. Robert Martin's Principles of Object-Oriented Design speaks of the Common Closure Principle (CCP) and Common Reuse Principle (CRP) for grouping classes together; the Dependency Inversion Principle (DIP) and Interface Segregation Principle (ISP) for classes to depend on abstractions and client-specific, or narrow, rather than fat interfaces.

Beyond design patterns at a higher level of abstraction one may speak of analysis patterns and enterprise patterns or more generally to speak of application and framework architecture. The elements of architecture include frameworks of reusable components that are idiomatic for each programming language so that C++, Java and C# libraries may be unique in design to take advantage of each language benefits. Examples include OpenGL and Open Scene Graph (OSG) in C++; Spring for Inversion of Control (IoC) or dependency injection in Java; COM+ and Enterprise Services in C#, the natural language of .NET. Other frameworks may be similar or identical for all of these, including Corba-IIOP, SOAP/SOA, ESB and Brokers at a high level; Hibernate persistence framework, xUnit test (i.e. CppUnit, JUnit and NUnit) and logging frameworks (i.e. log4cpp, log4j and log4net) at a lower level.

Similar to analysis patterns that transform into a myriad of design patterns, each of which can be implemented in any language and reflect the customary idiom, Corba-IIOP and DCE-RPC turn up as Corba in C++, RMI-IIOP in Java and .NET Remoting in C#.NET; message queuing WebSphere MQ, Java Message Queue (JMQ) and MSMQ are platform and language independent to a greater and lesser extent. By their nature, integration technologies need to span various technologies and platforms in order to serve their purpose. Adaptive Communications Environment (ACE), TAO Corba ORB (Object Request Broker) in C++; perhaps Internet Communications Engine (ICE) with bindings to these and other platforms is the most cohesive of all. The same can be said for transaction models that seem much alike from IBM CICS, BEA Tuxedo, Java EE, Microsoft MTS/COM+ and .NET Enterprise components.

The modern concept of an application server embodies many of these concepts in various combinations.