notes on software development: Undisciplined refactoring can create more spaghetti code

When finding a blob of code that is hard to comprehend, it is tempting to just dig in and break apart the blob into an number of seemingly related functions. Functional decomposition is a powerful concept in math and computer science. It breaks down a large and sometimes incomprehensible solution into one that is easier to understand; and as a result provides a more stable solution.

In the heat of the moment this can make a lot of sense by reading the code and breaking it apart for what it currently does and perhaps not what its intended to do. What this can lead to a more convoluted code-base in the name of refactoring.
Its easy to cut these corners when you are under pressure to fix a ticket, and/or tired from a long week. In the rush to fix be careful to see the larger design picture so you don't end up pushing that work further into the future.

So, instead of analysing what the software does at the moment, take a minute to understand what it is supposed to do. This is sometimes referred to as 'first principles' and is key to really understanding the problem; and as a result create a nice solution. How then would we do that in programming? By taking a proven problem solving approach that is used to solve any applied mathematics problem.

How to solve it

Remember our fundamental "How to solve it" steps. Understand, plan, execute, review. For seemingly easy fixes it can seen as overkill to keep to skip the first two steps and understand (design) on the fly while we execute (programming) on the solution. Stick to the more disciplined approach of moving through the steps as a simple checklist so you don't have to keep everything in your head.
This isn't a checklist to go through once, its really a mechanism to break down larger problem and build the solutions accordingly.

A moment to understand the Design

Are the intentions of the feature defined? If not, write down what value this feature is supposed to deliver. The features that implement this are defined by their responsibilities, not necessarily what its doing at the moment.
Take a moment to understand not what the code is doing, but what the responsibilities of the containing object or function has. Understand these responsibilities into an interface or better formed class, and then re-factor to fulfill those responsibilities.
Understanding the actual responsibilities can take some time, but not much; and you do have the time to get this into a coherent structure so you don't have to revisit it later. Draw a diagram and link the dependencies, inputs and the outputs to get a solid understanding of what this is supposed to do.

How it fits in. Whats the plan?

How does this class fit into the bigger picture? A good strategy is to use the 4 c model to see where your class or function fits into the component you are modifying, and if that 'fits in' to the intended functionality. Good architecture fits in, and doesn't stand out.

A Simple Example

Ok, enough talk-talk; lets go through an example. Lets say we are building a feature that lets users login to our fancy web app with social accounts. To 'make it work' and experimenting with the API we might end up with something like:

# using authomatic library and skipping code segments for sake of example
def get_user(network):
    email = None
    if (network == 'linkedin'):
        url = 'https://api.linkedin.com/v1/people/~?format=json'
        response_data = authomatic.login(self, network)
        first_name = response_data.get('firstName')
        last_name = response_data.get('lastName')
        email = response_data.get('emailAddress')
    elif (network == 'twitter'):
        url = 'https://api.twitter.com/1.1/statuses/user.json'
        response_data = authomatic.login(self, network)
        email = response_data.get('email')
    elif (network == 'facebook'):
        url = 'https://api.facebook.com/v2.12/'+result.user.id+'/me?fields=id,name,email,picture'
        response_data = authomatic.login(self, network)
        email = response_data.get('email')

    # we have a user class that requires an email in the constructor
    return User(email)

# 'refactor' attempt to smaller functions
def get_facebook_user():
    email = None
    url = 'https://api.facebook.com/v2.12/'+result.user.id+'/me?fields=id,name,email,picture'
    response_data = authomatic.login(self, network)
    email = response_data.get('email')
    return User(email)

def get_linkedin_user():
    url = 'https://api.linkedin.com/v1/people/~?format=json'
    response_data = authomatic.login(self, network)
    first_name = response_data.get('firstName')
    last_name = response_data.get('lastName')
    email = response_data.get('emailAddress')
    return User(email)

def get_twitter_user():
    url = 'https://api.linkedin.com/v1/people/~?format=json'
    response_data = authomatic.login(self, network)
    first_name = response_data.get('firstName')
    last_name = response_data.get('lastName')
    email = response_data.get('emailAddress')
    return User(email)


def get_user(network)
    if (network == 'linkedin'):
        return get_linkedin_user()
    elif (network == 'twitter'):
        return get_twitter_user()
    elif (network == 'facebook'):
        return get_facebook_user()

This is maybe a bit better, but to modify this to get more data from the api or to add error handling, the functions just expand as you add more logic. This is also only able to work when the app is running, how do I test this?
Let's take a moment and understand what we are trying to do:

The application needs a User object to identify who is logged in
The application can authenticate with an external identity provider

The responsibilities of this object then become: Authenticate, build User, handle errors, so lets refactor to a class that assumes these responsibilities, and no matter the network we use, we end needing these responsibilities fulfilled.

Even better, use an abstract class or interface to define the behavior. Objects have responsibilities and those responsibilities are fulfilled by behavior

The interface defines the behavior, so test and check for exceptions from logical methods that are defined in an interface. This is the main contract that the component has. (should have...of course. So much code doesn't have any structure)

class SocialAuthenticator(object):
    @abc.abstractmethod
    def authenticate(self, user_data, response_data):
        pass 
    @abc.abstractmethod
    def build_user(self, data):
        pass 
    @abc.abstractmethod    
    def get_api_data(self):
        pass

# So for any network you are using, you just extend with its own logic
class TwitterAuthenticator(SocialAuthenticator):

    def authenticate(self):
        return authomatic.login(self, 'twitter')
    
    def build_user(self, result):
        response = self.authenticate()
        email = response.get('emailAddress')
        return User(email)
        
    def get_api_data(self, result):
        url = 'https://api.twitter.com/1.1/statuses/user_timeline.json'
        response = result.provider.access(url, {'count': 5})
        return response

# This structure is more testable, so you can get the response data, and save it as a json file, and use that to test the creation of your users
import unittest
class test_authentication(unittest.TestCase):

    def get_twitter_data():
        #load and return file that would be reponse from twitter

    def test_create_user(self):
        auth = TwitterAuthenticator()
        self.assertNone(auth)
        test_result = get_twitter_data()
        user = auth.build_user(test_result)
        self.assertEquals(user.email, 'mytestaccount@whateves.com')

    #subsequent tests for creating users from different identity providers

Now we have a structure we can test without running the application, and can extend to get more data from each network. Also, we can just make a new class for any new networks we want authentication and user data from.

Wrap it up

By taking a pause and defining the responsibility of the object we were able to better understand the problem we had, which enabled a coherent plan for the execution of the solution. This results in a more coherent design which is much more easily understood, and also testable since we have an interface (or abstract class) to test against.

https://en.wikipedia.org/wiki/Decomposition_(computer_science)

notes on software development

Tuesday, January 08, 2019

Undisciplined refactoring can create more spaghetti code

How to solve it

A moment to understand the Design

How it fits in. Whats the plan?

Wrap it up

No comments:

Post a Comment