Five Great Practices for Safer Code

July 30, 2016July 30, 2016 Henry Dang General Programming, Java, python, Software Development

You’re sitting at your desk, glaring at your monitor, but it glares back at you with equal determination.

Every change you make introduces new bugs, and fixing a bug causes another bug to pop up.

You don’t understand why things are randomly breaking, and the lines of code just increase every day.

However, by coding in a rigorous and specific fashion, you can prevent many of these issues simply by being slightly paranoid. This paranoia can save you hours in the future, just by dedicating a few extra seconds to include some additional safeguards.

So without further ado, let’s jump right into the top five tips for safer code.

1. Stop Accepting Garbage Input

The common phrase “Garbage in, Garbage out” is one that rings strongly with many programmers. The fact is, if you accept garbage input, you’re going to pass out garbage output. If your code has any modularity at all, then something like this will likely happen :

def foo(input):
  do_stuff

def bar(input):
  do_other_stuff

garbage_input = 'Hi. I'm garbage input.'

some_variable = foo(bar(garbage_input))

As you call foo and bar and other functions, all of which depended on garbage_input, you find that everything has turned into garbage. As a result, functions will start throwing errors a few dozen passes down the line, and things will become very difficult to debug.

Another common mistake is attempting to correct the user’s input in potentially ambiguous cases, which leads to the second tip.

2. Don’t Try to Correct Garbage Input

Let’s take an example scenario :

Imagine you had a box that exported values from 0 to 1 on a display, depending on the number the user passed in.

One day, you suddenly get a value of 1.01, a value slightly higher than the maximum. Now, this should raise a red flag for most programmers. However, some programmers resort to doing the following :

def calculateValue(temperature):
  do_calculations

def getBoxValue(temperature):
  if calculateValue(temperature) > 1 :
    return 1
  elif calculateValue(temperature) < 0 :
    return 0
  else:
    return calculateValue(temperature)

The technique shown above is known as clamping, which is basically restricting the value to a certain range. In this case, it is clamped to 0 and 1. However, the problem with the above example is that it is now impossible to debug the code.

If the user passed in bad input, you would get a clamped answer, instead of an error, and if the calculateValue function was buggy, you would never know. It could be slightly inflating the value, and you would still never know, because the values would be clamped.

As an exaggerated example, if calculateValue returned 900,000,000, all you would see is “1”. Instead of embracing and fixing bugs, this tactic throws them under the carpet in the hopes that no one will notice.

A better solution would be :

def calculateValue(temperature):
  do_calculations

def getBoxValue(temperature):
  if(calculateValue(temperature) > 1
       or calculateValue(temperature) < 0):
    raise ValueError('Output is greater than 1 or less than 0.')
  else:
    return calculateValue(temperature)

If your code is going to fail, then fail fast and fix it fast. Don’t try to polish garbage. Polished garbage is still garbage.

3. Stop Double Checking Boolean Values in If Statements

Many programmers already adhere to this principle, but some do not.

Since Python prevents the bug caused by double checking a boolean value, I will be using Java, as the bug can only happen in languages where assignment is possible in if statements.

In a nutshell, if you do this :

boolean someBoolean = true;

if(someBoolean == true) {
  System.out.println('Boolean is true!');
} else {
  System.out.println('Boolean is false!');
}

In this case,

if(someBoolean == true)

Is exactly equivalent to :

if(someBoolean)

Aside from being redundant and taking up extra characters, this practice can cause horrible bugs, as very few programmers will bother to glance twice at an if statement that checks for true/false.

Take a look at the following example.

boolean someBoolean = (1 + 1 == 3);

if(someBoolean = true) {
  System.out.println('1 + 1 equals 3!');
} else {
  System.out.println('1 + 1 is not equal to 3!');
}

At first glance, you would expect it to print out “1 + 1 is not equal to 3!”. However, on closer inspection, we see that it prints out “1 + 1 equals 3!” due to a very silly but possible mistake.

By writing,

if(someBoolean = true)

The programmer had accidentally set someBoolean to true instead of comparing someBoolean to true, causing the wrong output.

In languages such as Python, assignment in an if statement will not work. Guido van Rossum explicitly made it a syntax error due to the prevalence of programmers accidentally causing assignments in if statements instead of comparisons.

4. Put Immutable Objects First In Equality Checks

This is a nifty trick that piggy backs off the previous tip. If you’ve ever done defensive programming, then you have most likely seen this before.

Instead of writing :

if(obj == null) {
  //stuff happens
}

Flip the order such that null is first.

if(null == obj) {
  //stuff happens
}

Null is immutable, meaning you can’t assign null to the object. If you try to set null to obj, Java will throw an error.

As a result, you can prevent the silly mistake of accidentally causing unintentional assignment during equality checks. Naturally, if you set obj to null, the compiler will throw an error because it’s checking a null object when it expects a boolean.

However, if you are passing around methods inside the if statement, it can become dangerous, particularly methods that will return a boolean type. The problem is doubly bad if you have overloaded methods.

The following example illustrates this point :

final int CONSTANT_NUM = 5;

public boolean foo(int x){
  return x%2 != 0;
}

public boolean foo(boolean x){
  return !x;
}

public void compareVals(int x){
  if(foo(x = CONSTANT_NUM)){
    //insert magic here
  }
}

In this example, the user expects foo to be passed in a boolean of whether or not x is equal to a constant number, 5.

However, instead of comparing the two values, x is set to 5. The expected value if the comparison was done correctly would be false, but if x is set to CONSTANT_NUM, then the value will end up being true instead.

5. Leave Uninitialized Variables Uninitialized

It doesn’t matter what language you use, always leave your uninitialized variables as null, None, nil, or whatever your language’s equivalent is.

The only exception to this rule is booleans, which should almost always be set to false when initialized. The exception is for booleans with names such as keepRunning, which you will want to set initially to true.

In Java’s case,

int x;
String y;
boolean z = false;

In particular, for Python especially, if you have a list, make sure that you do not set it to an empty list.

The same also applies to strings.

Do this :

some_string = None
list = None

Not this :

some_string = ''
list = []

There is a world of a difference between a null/None/nil list, and an empty list, and a world of a difference between a null/None/nil string, and an empty string.

An empty value means that the object was assigned an empty value on purpose, and was initialized.

A null value means that the object doesn’t have a value, because it has not been initialized.

In addition, it is good to have null errors caused by uninitialized objects.

It is unpleasant to say the least when an uninitialized string is set to “” and is prematurely passed into a function without being assigned a non-empty value.

As usual, garbage input will give you garbage output.

Conclusion

These five tips are not a magical silver bullet that will prevent you from making any bugs at all in the future. Even if you follow these five tips, you won’t suddenly have exponentially better code.

Good programming style, proper documentation, and following common conventions for your programming language come first. These little tricks will only marginally decrease your bug count. However, they also only take about an extra few seconds of your time, so the overhead is negligible.

Sacrificing a few seconds of your time for slightly safer code is a trade most people would take any day, especially if it can increase production speed and prevent silly mistakes.

Bogosort : Moving At The Pace of a Random Snail

July 24, 2016July 26, 2016 Henry Dang Uncategorized

Bogosort is a sorting algorithm, much like quick sort, and merge sort, with a twist.
It has an average case performance of O((n+1)!).

Bogosort is known by many other names, including stupid sort, monkey sort, and permutation sort.

Knowing this critical information, let’s dive into the meat of this sorting algorithm.

How Does It Work?

Bogosort works by taking an input list, and checking if it is in order. If it is, then we are done, and the list is returned. This means that Bogosort has a best case performance of O(n).

If the list isn’t in order, then we will shuffle the list until we get a list that is sorted. Unfortunately, since the list will be shuffled randomly, Bogosort technically has a worst case performance of O(∞).

Implementation

Bogosort’s implementation is short and concise.

import random

def Bogosort(list):
    while not isSorted(list):
        random.shuffle(list)
    return list

def isSorted(list):
    if(len(list) &amp;amp;lt;= 1):
        return True

    if(list[1] &amp;amp;gt;= list[0]):
        return isSorted(list[1:])
    else:
        return False

Simple and crisp. Let’s try it on some inputs.
Note that I used Python’s time module to print out the time, which is quick and dirty, but it does the job.

Bogosort([5,4,3,2,1])
Time = 0.000202894210815

Okay. Now let’s try it again on an input size of 10.

Bogosort([10,9,8,7,6,5,4,3,2,1])
52.8398268223

Since it is completely randomized, running this will give different times each time it is run, but the ballpark figure will at least be somewhat similar.

Normally, I’d try on a larger input size, but since the growth rate is going to be exponential, an input size of 11 will cause a time that is much longer than when an input size of 10 is used.

In fact, in our example, running on an input size of 10 took over 260,000 times longer than when run on an input size of 5.

However, since the numbers are random, it is impossible to determine exactly how much longer it will take , but it should be somewhere around that range.

Conclusion

If you are looking for a horrendously slow sorting algorithm for any reason, Bogosort is the perfect candidate. It works by randomly shuffling the list until it gets a sorted list, which means that its average performance is going to be O((n+1)!).

Of course, Bogosort should not be taken as a serious sorting algorithm, and is intended to be used as a comparison to other sorting algorithms like quick sort and merge sort.

Unless your goal is to create slow code, stay far far away from this monstrosity.

Why You Shouldn’t Validate Emails with Regex

July 17, 2016July 17, 2016 Henry Dang General Programming, regex

Foo, a programmer working at FooBar Inc., is happily working on a registration form, when he think to himself,

“Hmm. I should probably check if the email is valid.”

Somewhere along the line, Foo connect the words “email” and “valid” with “regular expressions”.

But as the great Jamie Zawinski once stated,

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

Implementing Regular Expressions to Validate Email Addresses

Now, for the sake of demonstration, let’s look into the thought processes of Foo.

First, Foo thinks, “Well, what makes an email address valid?”

He quickly scribbles down a list.

Email must contain an “@” symbol.
Email must contain a “.” symbol.
Email must not contain spaces.
Email can have the character sets [a-zA-z]
Email can have the number sets [0-9]

Foo also knows that the order must always go as follows :

Word
“@” symbol
“.” symbol
Domain Name

Okay. Seems simple enough, right?

Foo writes the following regex :

^[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z0-9]+

And it works flawlessly! At first, anyway.
It turns out that a regex statement like this won’t account for emails like “henry.dang@henrydangprg.com” or “henry-dang@henrydangprg.com”

The solution? As long as it’s [az-A-Z0-9._], it should be fine, up until the “@” symbol.

^[a-zA-Z0-9.-]+@[a-zA-Z0-9]+\.[a-zA-Z0-9]+

So far so good! Looks like everything’s working. But hold on, your friend tries to sign up with his email, “henry.dang@henry.dang.prg.com”, which is an absolutely valid email. Now you have to compensate for multiple periods after the “@” symbol!

But even after Foo fixes that, he realizes that his input could be infinitely large if someone strung together infinite periods! (EX : henry.dang@henry.henry.henry.henry.henry.henry.henry… and so on)

But wait! There’s more! The user could have a plus sign in his email, but not in the domain. And wait! What about foreign characters? And apostrophes? And all those other wild characters that hardly anyone would use in an email, but would still be valid if they really wanted to use it?

The fact of the matter is that Foo will most likely be unable to account for every single possible letter and combination. There are simply too many, and attempting to match all of them will simply bring in angry customers who have some esoteric symbol in their email that you failed to account for.

So What’s the Solution?

The solution is simple. Don’t use regex. It is not the right tool for the job. All you need to do is check if the user has an “@” and a “.” in their email. Anything else is extraneous and will lead to some user emailing you about how they can’t register with their email.

There is no point in attempting to check the infinite possibilities for an email. Send the user a confirmation email. If their email is valid, they will receive an email. If not, then their email was invalid, and they should change it.

For the stubborn or curious user, there is a solution available here. Clocking in at a whopping 6424 characters, this monstrous and unreadable regular expression is the last thing you want to use in your code.

	Soup&Ties on How To Get Your Company Hacked…
	K NARAYANA PRASAD on Why Keyword Arguments in Pytho…
	OpenCV HSV 颜色空间… on Color Detection in Python with…
	1.8 Development Step… on Color Detection in Python with…
	Henry Dang on How URL Routing in Django…

henrydangprg

A programming blog, by a programmer, for programmers

Month: July 2016

Five Great Practices for Safer Code

1. Stop Accepting Garbage Input

2. Don’t Try to Correct Garbage Input

3. Stop Double Checking Boolean Values in If Statements

4. Put Immutable Objects First In Equality Checks

5. Leave Uninitialized Variables Uninitialized

Conclusion

Bogosort : Moving At The Pace of a Random Snail

How Does It Work?

Implementation

Conclusion

Why You Shouldn’t Validate Emails with Regex

Implementing Regular Expressions to Validate Email Addresses

So What’s the Solution?