Sunday, January 31, 2016

SOLID Principles summary for CodeCampSDQ 5.0 attendees

This is a summary of the SOLID principles to be used as an index by the CodeCampSDQ 5.0 (2016) attendees.

Are you lost?

If you don't know what are the SOLID principles, or what is CodeCampSDQ please read the following two sections. If you were an attendee to the event and you are looking for the talk material jump to What is SOLID? 

What is CodeCampSDQ?

In summary (checkout the official site), CodeCampSDQ is
Just a group of "idealistic dreamers" who think they can change the world, a CodeCampSDQ at once.
 With the following mission (as of this writting)
The technical conference CodeCampSDQ tries to educate the community of professional Information Technology (IT) of the Dominican Republic, fostering a spirit of collaboration and sharing of knowledge for our attendees.
The conference official site is http://codecampsdq.com/ and Facebook page is https://www.facebook.com/CodeCampSDQ

Is the 5.0 (2016) a product version?

No, this means that is the event number five and happens on 2016.


What is SOLID?

Is a mnemonic acronym for five basic principles of object-oriented (OO) programming and design. The principles were stated by Uncle Bob (Robert C. Martin) in the early 2000s.

The initials stand for:
  • S: Single responsibility principle (SRP)
  • O: Open/closed principle (OCP)
  • L: Liskov substitution principle (LSP)
  • I: Interface segregation principle (ISP)
  • D: Dependency inversion principle (DIP)


Extended version of each principle

I've done a summary of each principle presented in a problem-solution way. Each post has the following basic structure:
  • Brief description of the principle
  • A sample code not honoring the principle aka the Dirty version 
  • A rationalization about why the code is violating the principle, along with the proposed solution
  • The solution presented as the Clean version, and
  • Finally, and explanation about what is the relation of the principle with Agile Software Development

This is the list of posts:


Where is the code?

All code samples used on the four posts are in a single GitHub repo lsolano/blog.solid.demo


Monday, January 18, 2016

SOLID Principles: ISP (4 of 4)

If you missed the previous post from this series check it out SOLID Principles: LSP (3 of 4).

Interface segregation principle: ISP

The principle states that
"No client should be forced to depend on methods it does not use."
In other words what the principle says is that components should be discrete: if you ask someone its name you don't want to know that he is divorced four times and likes Italian food.

Imaging you are working with an input mapped to a Camera, the stream of data coming from that source is Read Only, and Non Seekable. You have no way to "write" to the camera feed, or to seek the camera to a specific time; you only get to read or loose the data. But because you are using a general purpose IO framework you end up using an InputSream with reset() and mark() / seek() methods.

What the ISP says for cases like this, is to split the interface into as many smaller ones as needed by the clients. The division is done looking at the "roles" the initial interface must play. In our case the InputStream is playing a both the Read Only and Non Seekable roles. The first one is implicit by the name prefix "Input", so we don't need to create another one to indicate that is an Read Only stream. But for the second role we need a way to say that our stream does not allows seek operations. So we could split the InputStream in two components: InputStream, and SeekableStream. The first one stays as is, with the seek-related methods removed. The second one will only have the seek-only methods on it.

Related Concepts

In the previous example we have an, initial, InputStream playing to many roles: usually those interfaces are called "Fat Interfaces".

After doing the cleaning process, we end up with two more specific interfaces one to only read and another allowing us to seek, those interfaces are called "Slim Interfaces", or "Role Interfaces", meaning they are describing a single role.

Of course, that does not means that we are unable to implement both interfaces in a single concrete component. But for clients expecting just one of the roles we can declare that the given component fulfills that role, and no more, by casting the concrete implementation with the needed role interface.

Something like this:
1
2
3
4
5
CameraOrMediaInputStream camInput = new CameraOrMediaInputStream(...);
CameraReader camReader = new CameraReader((InputStream)camInput);

CameraOrMediaInputStream mediaInput = new CameraOrMediaInputStream(...);
MediaReader mediaReader = new MediaReader((SeekableStream)mediaInput);

Casts are redundant in many OO languages but were added to imply the signature of both constructors expecting the steams.

Be aware that "Role Interfaces", does not mean "Single Method Interfaces", a role may be fulfilled by a single method, or by sequence of calls to related methods.

Code sample

Code samples are on github repo lsolano/blog.solid.demo
The code was written using Scala 2.11.7 using the JVM 1.8.0u20. The example for this principle is an Automatic Teller Machine (ATM). Our ATM is located in an international airport so it must handle withdraws with currency conversions. The ATM also supports deposits using both cash and cheques.

Here you have a summary of the use cases:

The initial class diagram for these use cases are:
Fat Interface
Fat Interface

Here is the code for the "fat" ATMInteractor component:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class ATMInteractor(securityService: SecurityService, withdrawalService: WithdrawalService,
    depositService: AnyRef, currencyRatesService: AnyRef) {
  require(securityService != null)
  require(withdrawalService != null)
  require(depositService != null)
  require(currencyRatesService != null)

  def validate(request: CustomerValidationRequest): CustomerValidationResponse = {
    val validation = securityService.validateCustomer(PlasticInfo(request.secret, request.pin))

    CustomerValidationResponse(validation.valid)
  }

  def withdrawal(request: WithdrawalRequest): TransactionResponse = {
    val response = withdrawalService.withdrawal(com.malpeza.solid.isp.model.Withdrawal(request.pin, request.amount))
    val failReason = response.reason match {
      case Some(r) => r match {
        case com.malpeza.solid.isp.model.InsufficientBalance => InsufficientBalance
        case _ => CallBank
      }
      case _ => CallBank
    }

    TransactionResponse(response.done, Option(failReason))
  }
  
  def deposit(request: AnyRef) = ???
}

object ATMInteractor {
  def apply(securityService: SecurityService, withdrawalService: WithdrawalService,
    depositService: AnyRef, currencyRatesService: AnyRef): ATMInteractor = {
    new ATMInteractor(securityService, withdrawalService, depositService, currencyRatesService)
  }
}

What is wrong with this code: ISP?

Simply put, this component is doing to much. It handles all use cases: Customer Validation, Withdrawal, and Deposit. Because of that, it has to many dependencies (services).

If you are validating a customer you don't need to know anything about Deposits or Withdrawals.

Cleaning the code

Honoring the ISP

We need to split all these responsibilities and create specialized components able to handle each interaction (transaction). So we'll end up with three components named CustomerValidation, Withdrawal, and Deposit; each one representing a use case. To keep this post short, the Deposit and Withdrawal with currency conversions scenarios were left out.

This is the revised class diagram showing only the important components:
Clean Interfaces
Clean Interfaces


CustomerValidation code:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
class CustomerValidation(securityService: SecurityService) {
  require(securityService != null)

  def validate(request: CustomerValidationRequest): CustomerValidationResponse = {
    val validation = securityService.validateCustomer(PlasticInfo(request.secret, request.pin))

    CustomerValidationResponse(validation.valid)
  }
}

object CustomerValidation {
  def apply(securityService: SecurityService) = new CustomerValidation(securityService)
}

Withdrawal code:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
class Withdrawal(withdrawalService: WithdrawalService) {
  require (withdrawalService != null)
  
  def withdrawal(request: WithdrawalRequest): TransactionResponse = {
    val response = withdrawalService.withdrawal(com.malpeza.solid.isp.model.Withdrawal(request.pin, request.amount))
    val failReason = response.reason match {
      case Some(r) => r match {
        case com.malpeza.solid.isp.model.InsufficientBalance => InsufficientBalance
        case _ => CallBank
      }
      case _ => CallBank
    }

    TransactionResponse(response.done, Option(failReason))
  }
}

object Withdrawal {
  def apply(withdrawalService: WithdrawalService) = new Withdrawal(withdrawalService)
}

Agile Link

With the ISP we gain Orthogonality. For that concept I mean the following meanings
statistically independent
or
non-overlapping, uncorrelated, or independent objects of some kind

In other words, we get components with none or little relation. If one must change the other(s) remain untouched. This minimizes the impact of changes.

Orthogonality can be applied to architectural layers such as IO, Persistence, Transactions, Logging, etc.; or to behaviors like the sample project presented here. The last version has a clear separation of Use Cases. We could implement all those behaviors in a single component but that will make the code harder to maintain.

With such a reduced amount of use cases is difficult to see the need for this (over)engineering, but try to imagine the following:

You are working in an online retailing product's back-end API. It's the fifth year since the API initial version, and you have about 500 different use cases, each one with some pretty complicated scenarios such as:
  • Checkout with mixed payment methods: Coupons + Prepaid Card + Credit Card + PayPal, etc.
  • Manage upper bounds for customers by category to protect then from errors or malicious actions
  • Single purchase with multiple items with arbitrary groups allowing for shipment to different addresses
  • Manufacturers promotions by random distribution for customers matching certain profiles
  • ...

How many times you think you would expend each time you need to add a new feature or change an existing one if all these cases are implemented into a single component, or distributed within a small group of components?

Series links

Previous, SOLID Principles: LSP (3 of 4)
Next: none, this is the last one.

Wednesday, January 6, 2016

SOLID Principles: LSP (3 of 4)

If you missed the previous post from this series check it out SOLID Principles: OCP (2 of 4).

Liskov substitution principle: LSP

This principle states that:
"objects in a program should be replaceable with instances of their sub-types without altering the correctness of that program."
This introduces us to the concept of Substitutability.
Substitutability is a principle in object-oriented programming. It states that, in a computer program, if S is a sub-type of T, then objects of type T may be replaced with objects of type S (i.e., objects of type S may substitute objects of type T) without altering any of the desirable properties of that program (correctness, task performed, etc.). 
Put it simple: If you say you are a Duck be a one but fully. LSP differs from Duck typing (0) in that the former is more restrictive than the later. Duck typing says that for an object to be used in a particular context it must have a certain list of methods and properties, leaving out the details about what happens when you use those methods and properties (the object's internals). So how restrictive are the LSP sub-typing rules? here is a summary of the rules:
  • Requirements on signatures
    • Contravariance (1) of method arguments in the sub-type.
    • Covariance (1) of return types in the sub-type.
    • No new exceptions should be thrown by methods of the sub-type, except where those exceptions are themselves sub-types of exceptions thrown by the methods of the super-type.
  • Behavioral conditions
    • Preconditions (2) cannot be strengthened in a sub-type.
    • Postconditions (2) cannot be weakened in a sub-type.
    • Invariants (2) of the super-type must be preserved in a sub-type.
    • History constraint (the "history rule"): sub-types must not allow state changes that are impossible for its super-type, this is possible because sub-types may include new methods that can alter its state in ways not defined by the parent type. So the history that you get after calling a certain methods sequence must be the same for the super-type and the sub-type.

Principle requirements explained (top-down):

Contravariance of method arguments: 
If your father accepts Animals, you must accept also Animals, not just Cats.

Covariance of return types in the sub-type:
If your father returns Animals, you are safe to only return Cats, because all Cats are also Animals.

No new exceptions:
If your father throws an AnimalException you are free to throw AnimalTooLargeException but not a ChairException.

Preconditions cannot be strengthened in a sub-type:
Imagine that you have a VeterinaryClinic with an operation called sacrifice(pet: Animal), with a contract saying that for an animal to be sacrificed it must have a deadly disease (hasADeadlyDisease property returning true).

Later you derived a VainVeterinaryClinic with a new rule: you only sacrifice pretty and deadly ill animals. If they are ugly and have a deadly illness you let then suffer. Now this sub class has a more restrictive precondition (contract) in order to sacrifice animals. Where your code expect a Vet Clinic instance and you pass the Vain one you could receive an exception when trying to sacrifice an ugly deadly ill animal.

Postconditions cannot be weakened in a sub-type:
Continuing with this cruel example, if you have a post-condition for the sacrifice(pet: Animal) method indicating that after the method completes the animal must be dead (isAlive returning false), then a sub-type of the Vet Clinic class can not alter this behavior leaving animals alive after calling the sacrifice operation. This will be a more permissive (weak) contract than the one defined by the super-type.

Invariants of the super-type must be preserved in a sub-type:
If Animal class defines the name property as immutable, then Cats can not be renamed. For Animal instances the name is an invariant through the life-time of each object. So for Cats it must be also true.

History constraint:
Imagine an animal cage with two operations enter(a: Animal) and takeOut(): Animal; the case also have a read-only property called isEmpty: boolean. If you look back a sequence on calls to these methods (the history of an instance), you can easily predict the final value of isEmpty.
  • enter(a) => takeOut() => enter(a): isEmpty is false
  • enter(a) => takeOut() => enter(a) => takeOut(): isEmpty is true

Then if you derive a CatCage from AnimalCage the former must ensure that the same relation (on the later) holds true. You cannot have a CatCage saying that is empty after an enter(a: Animal) call.

These rules are better explained using an strongly typed language such as Java and C#, you can check out the following test classes and their associated class under test: NonGenericPetCageTests.java, and GenericPetCageTests.java (on the java sub-directory).

Side note: Who is Barbara Liskov?

Very, very short Bio

Barbara Liskov (born November 7, 1939 as Barbara Jane Huberman) is an American computer scientist who is an institute professor at the Massachusetts Institute of Technology (MIT) and Ford Professor of Engineering in its School of Engineering's electrical engineering and computer science department.

Among her achievements we found that, with Jeannette Wing, she developed a particular definition of sub-typing, commonly known as the Liskov substitution principle (LSP). For more information check her bio at MIT web site.

Why I'm writing about her?

In our industry (software development), and in particular my country the Dominican Republic, we have a huge disparity between male and female personnel. I believe that this is holding us back because as with any aspect of the life, the diversity is good to avoid biases and narrow thinking.

Code sample

Code samples are on github repo lsolano/blog.solid.demo 
The code was written using JavaScript over Node.js v4.2.4. The sample API is about a logging library.We'll focus our attention to the Appenders cluster (family). The main components are:
  • The API entry point called SOLIDLogs, from there get instances of loggers and appenders
  • The Logger, with very simple implementation: supports only debug() and error() operations.
  • The Appender interface (contract) and all implementations: Console, REST, DB, etc. Each appender is responsible for sending the messages from logger to its destination based on some configuration
All Level information was left out to keep the API as simple as possible. We are assuming that the level is ALL so always debug and error messages are sent to appenders.

Here is the diagram of the API on its first (dirty) version:
Dirty logging API
Dirty logging API

This is the first version of the Appeder Base (BlackHoleAppender), and its derivatives ConsoleAppender and MySQLAppender.

BlackHoleAppender (aka AppenderBase)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
function BlackHoleAppender(args) {
  this.name = (args.name || 'defaultAppender');
};

BlackHoleAppender.prototype.normalizeMessage = function(message, category) {
 return message;
}

BlackHoleAppender.prototype.append = function(message, category) {
  var finalMessage = this.normalizeMessage(message, category);
  this.write(finalMessage, category);
};

BlackHoleAppender.prototype.write = function(finalMessage, category) {
  /* To be implemented by sub-classes */
};

module.exports = BlackHoleAppender;

ConsoleAppender
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
const maxMessageLength = Math.pow(2, 30);

var AppenderBase = require('./BlackHoleAppender.js');

function ConsoleAppender(cons, maxLength) {
 this.console = (cons || console);
 this.maxLength = maxLength || 0;
 var finalArgs = ['ConsoleAppender'];
 Array.prototype.forEach.call(arguments, function(arg) {
  finalArgs.push(arg);
 });
 AppenderBase.apply(this, Array.prototype.slice.call(finalArgs));
}

ConsoleAppender.prototype = new AppenderBase('ConsoleAppender');

ConsoleAppender.prototype.write = function(finalMessage, category) {
  switch (category) {
  case "Error":
   this.console.error(finalMessage);
   break;
  default:
   this.console.debug(finalMessage);
 }
};

ConsoleAppender.prototype.normalizeMessage = function(message, category) {
 var msg = (message || '');
 var allowedLength = (this.maxLength || maxMessageLength);
 msg = msg.length > allowedLength? msg.substring(0, allowedLength) : msg;
 return msg;
}

module.exports = ConsoleAppender;

MySQLAppender
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
const maxMessageLength = Math.pow(2, 14);

var AppenderBase = require('./BlackHoleAppender.js');

function MySQLAppender(rdbmsRepository, maxLength) {
 this.repository = rdbmsRepository;
 this.maxLength = maxLength || 0;
 var finalArgs = ['MySQLAppender'];
 Array.prototype.forEach.call(arguments, function(arg) {
  finalArgs.push(arg);
 });
 AppenderBase.apply(this, Array.prototype.slice.call(finalArgs));
}

MySQLAppender.prototype = new AppenderBase('MySQLAppender');

MySQLAppender.prototype.write = function(finalMessage, category) {
  switch (category) {
  case "Error":
   this.repository.persist( { text: finalMessage, category: category });
   break;
  default:
   this.repository.persist( { text: finalMessage, category: "Debug" });
 }
};

MySQLAppender.prototype.normalizeMessage = function(message, category) {
 var msg = (message || '');
 var allowedLength = (this.maxLength || maxMessageLength);
 msg = msg.length > allowedLength? msg.substring(0, allowedLength) : msg;
 return msg;
}

module.exports = MySQLAppender;

To see how the API is used take a look to the following files: /js/test/dirtyTest.js and /js/test/cleanTest.js.

What is wrong with this code: LSP?

Here we have one base class and two derivatives. The base does nothing with the passed message on the normalization step, it just returns the same thing. So its contract is "allow all messages".

In contrast, Console and MySQL appenders have a limit on the message length. So they indeed are tightening their base-type's contract . That means that when an Appender is expected,  and we pass the Console or MySQL version parts of the message could me silently truncated. This is a direct violation to the LS principle. Sub-types of the base appender must accept all the same input range managed by the base-type.


Cleaning the code

Honoring the LSP

By the time of this writing (2016) we have the following limitations in the length of an string when targeting the following platforms / products:
  • UTF-8 is a "variable-length" encoding raging from 1 to 4 bytes per character, it can encode all UNICODE characters, so we must assume that UTF-8 will be used to store the message sent to the appenders.
  • The worst case scenario with UTF-8 is that all characters in a string use 4 bytes so we must divide the total bytes capacity of the storage media by 4 to know the safe possible maximum length.
  • JavaScript implementations can handle from 2^20 to 2^32 bytes per string. If we divide 2^32 by 4 we get 2^30, so for the ConsoleAppender the max allowed message length will be 2^30.
  • MySQL (5.x) has a limit for string (varchar) columns of 2^16, again divided by 4 yields 2^14
  • SQL-Server has a 2^30 bytes limit, divided by 4 gives us 2^28
With that information, and knowing that in order to honor the LSP the sub-types of BlackHoleAppender must allow at least the same message size as the super-type, we must force the base appender to handle the minimum possible message size, that is 2^14 from the MySQL implementation.

In order to do that we should "declare" the fact that know the appenders handle an explicit message max size. Also we must decide what to do when the message is longer than expected. To solve this we'll introduce the max length limit as the property maxLength, and the behavior when exceeded as an enumeration with only two possible values: Truncate (default), and Ignore.

As an invariant no sub-type of BlackHoleAppender should limit messages to a shorter length than its parent, in order to comply with the LSP. All these changes are captured in the following diagram:
Clean logging API
Clean logging API
The orange components (and notes), have all the relevant changes. Now lets see the base and console appeders:

BlackHoleAppender (cleaned):
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
var msgHandling = require('../messageHandling');

function BlackHoleAppender(name, config) {
  this.name = (name || 'blackHole');
  this.maxLength = (!!config.baseMaxLength? config.baseMaxLength : 0);
  this._messageHandling = msgHandling.truncate;
};

Object.defineProperty(BlackHoleAppender.prototype, 'messageHandling', {
 get: function() {
  return this._messageHandling;
 },
 set: function(messageHandling) {
  return this._messageHandling = messageHandling;
 }
});

BlackHoleAppender.prototype.normalizeMessage = function(message, category) {
 var msg = (message || '');

 if (msg.length > this.maxLength) {

  if (this.messageHandling === msgHandling.truncate) {
   msg = msg.substring(0, this.maxLength);
  } else {
   msg = null;
  }
 }
 
 return msg;
}

BlackHoleAppender.prototype.append = function(message, category) {
  var finalMessage = this.normalizeMessage(message, category);

  if (!!finalMessage) {
   this.write(finalMessage, category);
  }
};

BlackHoleAppender.prototype.write = function(finalMessage, category) {
  /* To be implemented by sub-classes */
};

module.exports = BlackHoleAppender;

ConsoleAppender (cleaned):
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
var AppenderBase = require('./BlackHoleAppender.js');

function ConsoleAppender(cons, config) {
 var finalArgs = ['console', config];
 Array.prototype.forEach.call(arguments, function(arg) {
  finalArgs.push(arg);
 });

 AppenderBase.apply(this, Array.prototype.slice.call(finalArgs, 2));

 this.maxLength = Math.max(config.consoleMaxLength, this.maxLength);
 this.console = (cons || console);
}

ConsoleAppender.prototype = new AppenderBase('console', {});

ConsoleAppender.prototype.write = function(finalMessage, category) {
  switch (category) {
  case "Error":
   this.console.error(finalMessage);
   break;
  default:
   this.console.debug(finalMessage);
 }
};

module.exports = ConsoleAppender;

API usage example (for real usage see mocha tests):
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
/* Optional: Appenders configuration override */
var configOverride = null,
    /* Boundary interface: How to "talk" with mySQL */
    repository = new DBMSXYZRepo(),
    /* The logger */ 
    logger = SOLIDLogs.getLogger(),
    /* Some appender */ 
    appender = SOLIDLogs.getMySQLAppender(repository, configOverride);

appender.messageHandling = msgHandling.ignore; /* default is msgHandling.truncate */
logger.addAppender(appender);

logger.debug('Hello World!');
logger.error('The World Is On Fire!!!');


Agile Link

With the LS principle we gain predictability. I like the following meaning found in the Oxford dictionary:
The fact of always behaving or occurring in the way expected.

Simply put with LSP we avoid surprises (aka WTFs). We avoid wasting time chasing bugs from bad-behaving components. If we are designing some framework we can create tests to be run by people making extensions or sub-types of our base types, also for people implementing our contracts (pure abstract, interfaces, or just words on paper).

With components designed like this, we are able to improve our estimates for change requests. Our velocity does not bounce dramatically and we get a sense of confidence both internal (the team) and external (stakeholders). This benefits help to build a Long Term Team, reduce stress level and staff turnover rate.


Series links

Previous, SOLID Principles: OCP (2 of 4)
Next, SOLID Principles: ISP (4 of 4)

Sunday, January 3, 2016

SOLID Principles: OCP (2 of 4)

If you missed the previous post from this series check it out SOLID Principles: SRP and DIP (1 of 4).

Open Closed Principle

The OC principle states that
"software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification"
 lets examine the meaning of extension and modification:
  • Extension means that the current functionality could be, relatively easy, enriched. 
  • Modification means the most insulting of the changes we could made to a component "to touch its source code". This is what we must try hard not to do.

Code sample

Code samples are on github repo lsolano/blog.solid.demo
This sample is built using C# and the .NET Framework 4.5. It is a "Social Network Hub", imagine you as a very busy / lazy person and because of that you don't want (or have the time to), be constantly checking all different social networks to see your friends updates. You want a single place to look for Facebook, Twitter, Instagram, etc., updates. You start by devising a software component called SocialNetworksHub that will check all your sources and will present the information in a consolidated way.

To keep the model simple, suppose we are only interested on the following data elements for each entry: text (no videos, images, audio, etc.), tags (labels, hash-tags, etc.), publication date, source, and author. The "normal" behavior for the component is to have a pulling interval for social networks without a push API, and to listen for push notifications from the ones with that support.

The hub has some helper components, they know how to "talk" with specific social networks APIs and convert that data to our internal model. Those components are called "collectors" with two kinds: Pull (on demand), and Push / Callback. The later are listeners for social networks supporting push notifications. All entries are sorted by date, the most recent ones first. Finally the hub support a predefined "filtering" capacity: it filters entries with "bad words".

Here is the "dirty" version of this component:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
public class DirtySocialNetworksHub : SocialNetworksHub
{
    private readonly IList<SocialNetworkPullCollector> collectors;
    private readonly bool removeEntriesWithBadWords;

    private readonly IEnumerable<String> badWords 
            = (new List<String> { "nestedLoops", "deadCode", "copy-n-paste" })
                .Select(word => word.ToLowerInvariant()).ToList();

    public DirtySocialNetworksHub()
        : this(false)
    {
    }

    public DirtySocialNetworksHub(bool removeEntriesWithBadWords)
    {
        this.collectors = new List<SocialNetworkPullCollector>();
        this.removeEntriesWithBadWords = removeEntriesWithBadWords;
    }

    public void AddCollector(SocialNetworkPullCollector collector)
    {
        this.collectors.Add(collector);
    }

    public IEnumerable<Entry> GetEntriesSince(DateTime since)
    {
        var entries = this.collectors.SelectMany(collector => collector.Collect(since));

        if (this.removeEntriesWithBadWords) {
            return entries.Where(entry => 
                !this.badWords.Any(badWord => entry.Text.ToLowerInvariant().Contains(badWord))
             && (!entry.Tags.Any() || entry.Tags.Select(tag => tag.ToLowerInvariant())
                    .Where(tag => this.badWords.Any(badWord => tag.Contains(badWord))).Count() == 0))
                            .OrderBy(entry => entry.Date).Reverse();
        }

        return entries.OrderBy(entry => entry.Date).Reverse();
    }
}


This is an overview of all components, the class above is an implementation of the orange contract (interface):
First class layout
First class layout

What is wrong with this code: OCP?

If you look closer to the GetEntriesSince(...) method you'll see how the "filtering" feature was implemented:
  • First, the component has a flag (given at construction time) to indicate if it must filter or not
  • Second, the filter is implemented as a hard-coded piece of code inside the if statement using Linq filters
  • Third, the "bad words" collection is hard coded inside the class

Now imagine what would happen in each of the following scenarios:
  1. From time to time the client request a change in the "bad words" list, new words are banned, then phrases, also abbreviations as WTF
  2. One day a requirement arrives asking to filter entries by their extension, so only entries with up to certain characters limit will be considered
  3. Another change arrives asking for a "little" tweak on the extension filter to allow for a length range, not just an upper bound
  4. Then a regulation force us to implement a parental control feature with complex filter depending on the audience
  5. Finally, your users request a new feature a "Mood Filter", they want you to implement a "flexible" filter that can be parameterized with allowed and forbidden moods. For example you can allow only "happy" entries, also you can eliminate entries by "bad" topics such as Politics, Dogs, or Celebrity's Gossip. This crazy request is backed up by some NLP (Neuro-linguistic programming) Guru, and will be implemented using a third party Sentiment analysis API.
  6. ...
If we were to change the GetEntriesSince(...) method for every request affecting the "filtering" capacity our hub will get touched a lot. It can easily get to 2,000+ lines in a couple of months. So we need to kill that monster before it grows.

Cleaning the code

Honoring the OCP

The required fix is quite simple: we just need to implement the same strategy used for the collectors. 
  • We extract the filter functionality into a contract, 
  • Then, implement the bad words filtering feature a the first realization of that contract, 
  • Finally, allow the hub to work without filters by just returning the raw collection of entries when no filters added
This way we leave out the basic algorithm from the possible constant change generated by the volatile "filter" functionality. 

The components diagrams now looks like this:
Second class layout
Second class layout

Here are the relevant changes:
  • The filter functionality was extracted to the EntriesFilter component, just as the collection with the Collectors family. 
  • Collectors are now required to add a unique hash to each collected entry including the a source part, some like this: 'facebook-bc7702b7-9f83-49b0-8245-f11c72de4b04', the UID could be provided by the underlying API, or just generated on the fly by the collector.
  • Each filter component must accept a collection of entries and return a collection of hashes for the filter-out ones, or empty if none.
The first two code samples below are the new, clean, implementation of our hub, and the extracted out "bad words" filter:

Clean hub:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
public class CleanSocialNetworksHub : SocialNetworksHub
{
    private readonly IList<SocialNetworkPullCollector> collectors;
    private readonly IList<EntriesFilter> filters;

    public CleanSocialNetworksHub()
    {
        this.collectors = new List<SocialNetworkPullCollector>();
        this.filters = new List<EntriesFilter>();
    }

    public void AddCollector(SocialNetworkPullCollector collector)
    {
        this.collectors.Add(collector);
    }

    public void AddFilter(EntriesFilter filter)
    {
        this.filters.Add(filter);
    }

    public IEnumerable<Entry> GetEntriesSince(DateTime since)
    {
        var entries = this.collectors.SelectMany(collector => collector.Collect(since));

        return FilterEntries(entries).OrderBy(entry => entry.Date).Reverse();
    }

    private IEnumerable<Entry> FilterEntries(IEnumerable<Entry> entries)
    {
        var filterOutHashes = new ConcurrentBag<String>();

        Parallel.ForEach(this.filters, (filter) =>
        {
            IEnumerable<String> invalidEntriesHashes = filter.Filter(entries);
            foreach (var hash in invalidEntriesHashes)
            {
                filterOutHashes.Add(hash);
            }
        });

        var finalExclusions = filterOutHashes.Distinct().ToArray();

        return entries.Where(entry => !finalExclusions.Contains(entry.Hash));
    }
}

Bad words filter:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
public sealed class BadWordsFilter : EntriesFilter
{
    private readonly ISet<String> badWords = new HashSet<String>();

    public BadWordsFilter(IEnumerable<String> badWords)
    {
        var words = badWords ?? new List<String>();
        var normalizedWords = words.Where(word => !String.IsNullOrWhiteSpace(word))
                                        .Select(word => word.ToLowerInvariant().Trim());
        foreach (var word in normalizedWords)
        {
            this.badWords.Add(word);
        }
    }

    public IEnumerable<string> Filter(IEnumerable<Entry> entries)
    {
        foreach (var entry in entries.Where(entry => IsBadEntry(entry)))
        {
            yield return entry.Hash;
        }

        yield break;
    }

    private bool IsBadEntry(Entry entry)
    {
        if (this.badWords.Any(badWord => entry.Text.ToLowerInvariant().Contains(badWord)))
        {
            return true;
        }

        var lowerTags = entry.Tags.Select(tag => tag.ToLowerInvariant());
        if (lowerTags.Any(tag => this.badWords.Any(badWord => tag.Contains(badWord))))
        {
            return true;
        }

        return false;
    }
}

The usage of the clean hub will be something like this (see tests for real examples):
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
[Test]
public void TestSomething()
{
    /* Test data: Fake entries */
    IEnumerable<Entry> entries = ...;

    /* Dummy collector (boundary interface) */
    SocialNetworkPullCollector returnAllCollector = new ReturnAllPullCollector(entries);

    /* Filter: real implementation */
    EntriesFilter filter = new BadWordsFilter("nestedLoops,deadCode,copy-n-paste".Split(','));

    /* Hub instantiation and configuration */
    SocialNetworksHub hub = new CleanSocialNetworksHub();
    hub.AddCollector(returnAllCollector);
    hub.AddFilter(filter);
}

Going further:
If you look to the final version of the method GetEntriesSince(...), you see just two lines of code, one that collect all entries and another that does the filtering and sorting. If we start to receive crazy sorting requirements like: by social network then by date, or by author ranking, then by social network, and then by date, etc.; then we must apply a similar solution and extract out the "sorting" responsibility to a new family of components. If you like just fork the github repository and do that.


Agile Link

Again, I'll elaborate the code link to agile thinking using the Agile Principles from the manifesto.
The best architectures, requirements, and designs emerge from self-organizing teams.
The clean implementation is an example of "emergent design". The initial requirement has a clear insight about the "collection" mechanisms so from day one we devise a flexible solution for this responsibility. After our imaginary first release, we start to see more an more incoming changes to the "filter" responsibility, so by that time we evolve our simple solution to a more maintainable one. We are also honoring the KISS principle that says
Keep it simple, stupid.

Series links



Subscribe to our mailing list

* indicates required
Email Format