Archive for the ‘Ideas and Rants’ Category

Rant About Null

Monday, August 10th, 2015

The null value seems to be very misunderstood concept in programming, especially among beginners. I do not think that there is any definite ruleset how or how not to program, there is no best practice in my mind. Still there are things that just make sense if a person has enough information about their environment. In this post, I will elaborate some differences between a null and an empty value in hopes of favoring more critical thinking when a situation calls for those concepts.

So, in many programming languages there are different types of variables. For example an integer type will hold integer numbers, a string will hold some text etc. This is familiar to anyone who has learned some programming skills.

Type Example values
Int 0, 1, 2
Float 0.0, 2.5, 10.7
String '', 'abc', 'Hello, World!'
Boolean true, false

While those and the following concepts are true in many languages, I will here concentrate on PHP. In PHP there are some kinks, which I think should make people think about different programming concepts even more. Of course this can be said for many languages. PHP is a very specific language meant for web back-end development. It has gotten some weird reputation over the years, which I am not going to address here currently.

Equality Check in PHP

In PHP the regular == check will compare by value. There are some values that are considered equal and pass the check, which in other languages would probably fail. One use case where you might want to have such a comparison could be to check if there is some non-empty string input. For that you could do just:

if ($inputField) {
//Do something with that

Here if there is a string in the $inputField, it can be parsed as a boolean and checked in the if-statement. Although in some regard it is not such a good way to write that check. For someone unfamiliar with how PHP behaves, the flow of this condition is not easily graspable. It would be better to write:

if ($inputField != '') {
//Do something with that

Now it is easily seen, that if the value is not an empty string, we can proceed with the code inside the if-block. Alas, if the value of $inputField happens to be an integer 0 or a boolean false, then the check also fails. Even more worrying is the behavior if $inputField is some other integer or a boolean true. Then the check passes and we proceed inside the if-block as well. Check out the comparison tables from PHP-s manual to get a better picture.

There is also a strict comparison === that compares both the type and the value.

Comparison Result
'' == 0 true
'' === 0 false

The Null Value

In addition to regular values, there is also a special null value, which basically means that no value has been assigned. For example think of some Items that each can be in a single Category. Emphasis is on the can be, which means that some Items can also not be in a category. It makes sense that if an Item is not inside a category, then its category field is equal to null. But in PHP equal to null might mean also that it has a value 0 or an empty string.

The MySQL Database and Foreign Keys

A PHP application is very often integrated with a MySQL relational database. Usually through an ORM like Doctrine or Eloquent in order to make development more easier and code readable.

In a novice application one might find something like this from the database:

There are primary auto-increment id-s, but the relation key is just a regular integer field. As we stated before, in PHP 0 is equal to null by value, and so often people will just put 0 in the category_id field, if there is no category assigned. This means that the entries in the items table will look like this:

This is quite bad in a variety of reasons. Main problem is that it is missing foreign keys. One reason to have them is to ensure consistency in the database. If this category_id field should have only id-s from the categories table in it, then a foreign key will check that. Usually a foreign key will also come with an index, which means that joins or selects on that relation are faster.

I understand that one may think there is no need for any database checks. The code is what defines the behavior and if the code does not write anything else into that field, then nothing will break. That is true, but no code is 100% correct and accidentally writing a wrong value into a foreign key field is quite common. Think of it as fastening a seat belt, when driving a car. Accidents do not happen all that often, but we still fasten the seat belts. When you mess up the database, then fixing it may take a lot of man hours or is sometimes even impractical.

So the database should actually look like this:

We can even see the relation now. While many-to-one and one-to-one relations are exactly the same from the database perspective, a modelling tool allows you to distinguish between them. So here we can see (the inverse arrowhead near the items table) that many items can belong to one category. Or in other words, one category has many items.

Plus, having foreign key will also make navigating through the data via a interface more easier.

Notice that the category_id values have now turned into links. This means that I can click on them and go to a category defined by the corresponding id value.

Furthermore, if one does a join, the EXPLAIN will show that it is using the foreign key index, which is fast.

Compared to our previous case without a foreign key, we get something like this:

All sorts of warning things like using where and using join buffer appear here. There are also no possible keys available. This usually means that the query will become really slow if the data amounts grow.

Here is a simple test. Let us say we have 300 categories and 300 items in each category. All together 90000 items. Say the user has picked a category and we now want to list all the items belonging to that category. MySQL will first fetch our category based on the primary key, and then tries to find all items matching that category id.

If we have a foreign key index, MySQL can use that to find the corresponding items in logarithmic time.

When we do not have that index, it will become linear time, indicated by the Using where.

The time difference is about 15 times for one query. This is a lot.

Null Value in MySQL

So we would want to use foreign keys, but now we can not add a row with a category_id value 0, because there is no such entry in the categories table. We could however make the field nullable and add a NULL value.

With this configuration we can have valid orphans – items that do not belong to any category. In the database they would look like this:

This is the way it is intended to be done by MySQL, so one would hope that it is beneficial to follow this pattern. Here is some other interesting discussion on foreign keys.

Back to PHP

Now we have seen that in the MySQL side we want to use NULL to indicate a missing value. Because of that developers kind of expect a missing value to be null on PHP-s side as well. ORM usually handles the mapping for you and all you have to do, is to check against null.

Consistency is very important in any application. This means that a developer is consistent on the terminology and programming patterns. Like I said, there is no best practice way to do things. But if we say that null indicates a missing value, not 0 or an empty string, then this should be consistent throughout the application.

For example, some developer might write this as a check:

if (null !== $value) {
//Do something interesting

Now this will not succeed if someone decides to use 0 or an empty string as the given $value.

Furthermore, null seems to be a logical pick for the unassigned value. The number 0 means something different, as does an empty string.

Other Thoughts 

The null value should not only be used for associations. That is why every field in the database can be nullable. Easiest example is the date_deleted or deleted_at field in the case of soft-deletable entries. If that field has NULL in it, then the entry exists. If it has a datetime, then the entry is considered deleted from the application.

There are other fields that could be nullable. Fields in an entry can be filled in different cases for the user. If a user never goes to the profile editing page and enters a phone number, then we shouldn’t write anything but NULL in the corresponding field. It is not 0 or an empty string, those are the values that a user can go and enter in the corresponding form.

Again, one could write their own values to indicate missing or non-existent data, but it is probably going to cause problems in the long run. For example if there will be another interface that will change those fields in the database. In that case it can happen that you have both 0, empty string and NULL specifying the same semantical concept. But now you want to find out users, who have not filled in their phone number, in order to send them an e-mail about that…