Sunday, January 3, 2016

SOLID Principles: OCP (2 of 4)

If you missed the previous post from this series check it out SOLID Principles: SRP and DIP (1 of 4).

Open Closed Principle

The OC principle states that
"software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification"
 lets examine the meaning of extension and modification:
  • Extension means that the current functionality could be, relatively easy, enriched. 
  • Modification means the most insulting of the changes we could made to a component "to touch its source code". This is what we must try hard not to do.

Code sample

Code samples are on github repo lsolano/blog.solid.demo
This sample is built using C# and the .NET Framework 4.5. It is a "Social Network Hub", imagine you as a very busy / lazy person and because of that you don't want (or have the time to), be constantly checking all different social networks to see your friends updates. You want a single place to look for Facebook, Twitter, Instagram, etc., updates. You start by devising a software component called SocialNetworksHub that will check all your sources and will present the information in a consolidated way.

To keep the model simple, suppose we are only interested on the following data elements for each entry: text (no videos, images, audio, etc.), tags (labels, hash-tags, etc.), publication date, source, and author. The "normal" behavior for the component is to have a pulling interval for social networks without a push API, and to listen for push notifications from the ones with that support.

The hub has some helper components, they know how to "talk" with specific social networks APIs and convert that data to our internal model. Those components are called "collectors" with two kinds: Pull (on demand), and Push / Callback. The later are listeners for social networks supporting push notifications. All entries are sorted by date, the most recent ones first. Finally the hub support a predefined "filtering" capacity: it filters entries with "bad words".

Here is the "dirty" version of this component:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
public class DirtySocialNetworksHub : SocialNetworksHub
{
    private readonly IList<SocialNetworkPullCollector> collectors;
    private readonly bool removeEntriesWithBadWords;

    private readonly IEnumerable<String> badWords 
            = (new List<String> { "nestedLoops", "deadCode", "copy-n-paste" })
                .Select(word => word.ToLowerInvariant()).ToList();

    public DirtySocialNetworksHub()
        : this(false)
    {
    }

    public DirtySocialNetworksHub(bool removeEntriesWithBadWords)
    {
        this.collectors = new List<SocialNetworkPullCollector>();
        this.removeEntriesWithBadWords = removeEntriesWithBadWords;
    }

    public void AddCollector(SocialNetworkPullCollector collector)
    {
        this.collectors.Add(collector);
    }

    public IEnumerable<Entry> GetEntriesSince(DateTime since)
    {
        var entries = this.collectors.SelectMany(collector => collector.Collect(since));

        if (this.removeEntriesWithBadWords) {
            return entries.Where(entry => 
                !this.badWords.Any(badWord => entry.Text.ToLowerInvariant().Contains(badWord))
             && (!entry.Tags.Any() || entry.Tags.Select(tag => tag.ToLowerInvariant())
                    .Where(tag => this.badWords.Any(badWord => tag.Contains(badWord))).Count() == 0))
                            .OrderBy(entry => entry.Date).Reverse();
        }

        return entries.OrderBy(entry => entry.Date).Reverse();
    }
}


This is an overview of all components, the class above is an implementation of the orange contract (interface):
First class layout
First class layout

What is wrong with this code: OCP?

If you look closer to the GetEntriesSince(...) method you'll see how the "filtering" feature was implemented:
  • First, the component has a flag (given at construction time) to indicate if it must filter or not
  • Second, the filter is implemented as a hard-coded piece of code inside the if statement using Linq filters
  • Third, the "bad words" collection is hard coded inside the class

Now imagine what would happen in each of the following scenarios:
  1. From time to time the client request a change in the "bad words" list, new words are banned, then phrases, also abbreviations as WTF
  2. One day a requirement arrives asking to filter entries by their extension, so only entries with up to certain characters limit will be considered
  3. Another change arrives asking for a "little" tweak on the extension filter to allow for a length range, not just an upper bound
  4. Then a regulation force us to implement a parental control feature with complex filter depending on the audience
  5. Finally, your users request a new feature a "Mood Filter", they want you to implement a "flexible" filter that can be parameterized with allowed and forbidden moods. For example you can allow only "happy" entries, also you can eliminate entries by "bad" topics such as Politics, Dogs, or Celebrity's Gossip. This crazy request is backed up by some NLP (Neuro-linguistic programming) Guru, and will be implemented using a third party Sentiment analysis API.
  6. ...
If we were to change the GetEntriesSince(...) method for every request affecting the "filtering" capacity our hub will get touched a lot. It can easily get to 2,000+ lines in a couple of months. So we need to kill that monster before it grows.

Cleaning the code

Honoring the OCP

The required fix is quite simple: we just need to implement the same strategy used for the collectors. 
  • We extract the filter functionality into a contract, 
  • Then, implement the bad words filtering feature a the first realization of that contract, 
  • Finally, allow the hub to work without filters by just returning the raw collection of entries when no filters added
This way we leave out the basic algorithm from the possible constant change generated by the volatile "filter" functionality. 

The components diagrams now looks like this:
Second class layout
Second class layout

Here are the relevant changes:
  • The filter functionality was extracted to the EntriesFilter component, just as the collection with the Collectors family. 
  • Collectors are now required to add a unique hash to each collected entry including the a source part, some like this: 'facebook-bc7702b7-9f83-49b0-8245-f11c72de4b04', the UID could be provided by the underlying API, or just generated on the fly by the collector.
  • Each filter component must accept a collection of entries and return a collection of hashes for the filter-out ones, or empty if none.
The first two code samples below are the new, clean, implementation of our hub, and the extracted out "bad words" filter:

Clean hub:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
public class CleanSocialNetworksHub : SocialNetworksHub
{
    private readonly IList<SocialNetworkPullCollector> collectors;
    private readonly IList<EntriesFilter> filters;

    public CleanSocialNetworksHub()
    {
        this.collectors = new List<SocialNetworkPullCollector>();
        this.filters = new List<EntriesFilter>();
    }

    public void AddCollector(SocialNetworkPullCollector collector)
    {
        this.collectors.Add(collector);
    }

    public void AddFilter(EntriesFilter filter)
    {
        this.filters.Add(filter);
    }

    public IEnumerable<Entry> GetEntriesSince(DateTime since)
    {
        var entries = this.collectors.SelectMany(collector => collector.Collect(since));

        return FilterEntries(entries).OrderBy(entry => entry.Date).Reverse();
    }

    private IEnumerable<Entry> FilterEntries(IEnumerable<Entry> entries)
    {
        var filterOutHashes = new ConcurrentBag<String>();

        Parallel.ForEach(this.filters, (filter) =>
        {
            IEnumerable<String> invalidEntriesHashes = filter.Filter(entries);
            foreach (var hash in invalidEntriesHashes)
            {
                filterOutHashes.Add(hash);
            }
        });

        var finalExclusions = filterOutHashes.Distinct().ToArray();

        return entries.Where(entry => !finalExclusions.Contains(entry.Hash));
    }
}

Bad words filter:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
public sealed class BadWordsFilter : EntriesFilter
{
    private readonly ISet<String> badWords = new HashSet<String>();

    public BadWordsFilter(IEnumerable<String> badWords)
    {
        var words = badWords ?? new List<String>();
        var normalizedWords = words.Where(word => !String.IsNullOrWhiteSpace(word))
                                        .Select(word => word.ToLowerInvariant().Trim());
        foreach (var word in normalizedWords)
        {
            this.badWords.Add(word);
        }
    }

    public IEnumerable<string> Filter(IEnumerable<Entry> entries)
    {
        foreach (var entry in entries.Where(entry => IsBadEntry(entry)))
        {
            yield return entry.Hash;
        }

        yield break;
    }

    private bool IsBadEntry(Entry entry)
    {
        if (this.badWords.Any(badWord => entry.Text.ToLowerInvariant().Contains(badWord)))
        {
            return true;
        }

        var lowerTags = entry.Tags.Select(tag => tag.ToLowerInvariant());
        if (lowerTags.Any(tag => this.badWords.Any(badWord => tag.Contains(badWord))))
        {
            return true;
        }

        return false;
    }
}

The usage of the clean hub will be something like this (see tests for real examples):
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
[Test]
public void TestSomething()
{
    /* Test data: Fake entries */
    IEnumerable<Entry> entries = ...;

    /* Dummy collector (boundary interface) */
    SocialNetworkPullCollector returnAllCollector = new ReturnAllPullCollector(entries);

    /* Filter: real implementation */
    EntriesFilter filter = new BadWordsFilter("nestedLoops,deadCode,copy-n-paste".Split(','));

    /* Hub instantiation and configuration */
    SocialNetworksHub hub = new CleanSocialNetworksHub();
    hub.AddCollector(returnAllCollector);
    hub.AddFilter(filter);
}

Going further:
If you look to the final version of the method GetEntriesSince(...), you see just two lines of code, one that collect all entries and another that does the filtering and sorting. If we start to receive crazy sorting requirements like: by social network then by date, or by author ranking, then by social network, and then by date, etc.; then we must apply a similar solution and extract out the "sorting" responsibility to a new family of components. If you like just fork the github repository and do that.


Agile Link

Again, I'll elaborate the code link to agile thinking using the Agile Principles from the manifesto.
The best architectures, requirements, and designs emerge from self-organizing teams.
The clean implementation is an example of "emergent design". The initial requirement has a clear insight about the "collection" mechanisms so from day one we devise a flexible solution for this responsibility. After our imaginary first release, we start to see more an more incoming changes to the "filter" responsibility, so by that time we evolve our simple solution to a more maintainable one. We are also honoring the KISS principle that says
Keep it simple, stupid.

Series links



No comments:

Post a Comment

enter your comments here...

Subscribe to our mailing list

* indicates required
Email Format