Applying deep learning based algorithms to medical imaging data such as x-rays, CT and MRI scans has proven itself a powerful method for a broad range of image diagnosis, recognition and classification tasks. One of these tasks is the establishment and use of appropriate and fully automated methods for determining the bone age since it is a crucial parameter for monitoring and assessing the health status of children. Manual process of bone age prediction is a time consuming process whereas its automated prediction based on deep learning models has proven their optimized and better accuracy. Our aim is now to develop an algorithm to predict the bone age of children given x-ray images — without any physician.
But before we get right into it, let’s introduce ourselves — We are a team of four students with different coding backgrounds, ranging from very limited to advanced coding experience. By joining the TechLabs Community we gained state-of-the-art tech skills in AI that we now want to apply into practice for this project. TechLabs is the winner of the Google Impact Challenge 2018 and creates local communities to bring together experienced AI-cracks and complete novices of all disciplines. As such it provides for a platform to start engaging with tech, cooperate in interesting first projects or just chat about tech’s ever-growing impact on one’s field. Our article is the byproduct of our first project, which we began in December. All of us are TechLabs members in the organization’s founding city Münster, Germany, and began our curriculum in October 2018. Coming from very different backgrounds, some of us had a little experience with machine learning — others nothing whatsoever. We hope that there might be someone out there, who is just starting her or his machine learning track and might be interested to see how we approached our project or how quickly you can progress in this field. If you are considering to widen your view in this field — make sure to check out TechLabs, as they can provide you with profound assistance.
In this section, we are presenting our work using deep learning approaches. You can visit our GitHub to see the full code.
Before we dive into it — let’s find a large dataset to pretrain on. In 2017 a contest at RSNA took place to correctly identify the age of a child from an x-ray of their hand. The dataset we have used for our project was originally published on CloudApp and is also provided on Kaggle. The dataset encompasses a number of files such as the bone age-test and training dataset containing information on the label, gender and age of each image, ranging from 1 to 228 months, as well as all the 12000 x-ray images (digital and scanned), available in png format.
Next, we merged the labels from the CSV-file as well as the images into one dataframe. After that, we split our dataset into a training and test set. Now, let’s get started and import all necessary packages.
2.1. Data Augmentation:
When we train a model we automatically start to overfit. One way to overcome overfitting, especially when we are training for more epochs is to create more data. This procedure is called data augmentation. It also helps us to generalize our deep model better. So how do we “augment” our training samples?
We can start by importing keras image preprocessing:
from keras.preprocessing.image import ImageDataGenerator
Real data augmentation can then be carried out by putting a list of parameters into ImageDataGenerator() to randomly change the images however we wish without impacting their interpretations. The data will be looped over (in batches). Here are the few options we have chosen for our minor alterations:
rotation_range: this parameter allows us to specify the range of rotations in degrees to rotate the image in a certain direction — clockwise or counterclockwise. During that rotation the size of the image remains the same. Note that some of the image regions will be cropped out and some of the regions of the new image will need to be filled.
But we can solve this problem by the following parameter:
fill_mode: It supports a variety of different ways for filling, but here we use “nearest” since it’s the default. This mode assigns the color of the closest existing pixel to the pixel that should be blank. However, other options are available: “constant”, “reflect” or “wrap”.
width_shift_range: shifts the image to a certain direction along the horizontal axis, left or right.
height_shift: shifts the image along the vertical axis, up or down.
brightness_range: range for picking a brightness shift value from.
zoom_range: is for randomly zooming the initial image in or out.
horizontal_flip: flips the image with respect to the vertical axis.
One of ImageDataGenerator class methods is flow_from_directory() to read the images from a big numpy array and folders containing images and is designed to be used with classification models. It helps us to generate batches of image data.
Let’s start producing some output — here’s what we get from our real data augmentation:
Figure 1: Data Augmentation
Notice that each image is a bit different from the other due to our applied parameters such as zoom, rotation, width or height, brightness etc. This will help the model to learn sufficiently well. — Just with a few lines of code — impressive, huh?
2.2. Neuronal Network:
What is a neural network? To get started, we’ll give you a brief explanation: The basic idea behind a neural network is to simulate lots of densely interconnected “brain cells” inside a computer. It’s important to note that our neural network is obviously referred to as artificial neural network — and the amazing thing about ANN? It can learn all by itself. But it requires lots of data to train, test and subsequently improve its model. Ok, got it! And what are convolutional neural networks? CNN’s are a class of neural networks that have proven very effective in areas of image recognition. Perfect for the aim of our project, right?
So, our focus in this section will be the development of the neural network, starting with a general description of CNNs and then heading over to an overview of our CNN.
2.2.1 Convolutional Layer:
A CNN image classification model takes an input image, passes it through a series of convolutional, nonlinear, pooling, and fully connected layers, and finally gives us an output. An input image is basically an array of pixels in form of a matrix: h x w x d ( h = Height, w = Width, d = Dimension ). Each input image will be passed through a series of convolution layers with filters. And in practice, a CNN learns the values of these filters on its own during the training process. This filter is called “kernel” and is also an array of numbers. Let’s imagine this kernel is sliding across all the areas of the input image with a predefined step width. This step width or the distance a kernel moves each time over an image is defined as a “stride”. The figure below displays an exemplary kernel. Kernels are typically square and 3x3 is a fairly common kernel size.
Figure 2: 3x3 Kernel
A very common kernel operator for finding the regions in an image where we have a sharp change in intensity or a sharp change in color is the sober operator. It helps us to determine the vertical edges and is also very useful for identifying objects and their borders. And what does this mean for our CNN? In convolutional networks, the values in this kernel are trainable — the kernel’s weights start off with any random value and change during the training phase.
Ok, let’s slow down.
Take a look at the following figure to understand how a 3x3 kernel is applied over the image:
Figure 3: Convolutioning
We slide (convolve) the 3x3 matrix (kernel) over our original image by 1 pixel (stride), and for every position it is multiplying the values in the kernel with the original pixel values of the image. This procedure is called computing element wise multiplication. These multiplications are all summed up so for each region that the kernel covers, one number/pixel is computed.
You will find out that we are left with a Convolved Feature or Feature Map! The output of the convolution is smaller (in width and height) than the original image.
2.2.2 Max Pooling Layer:
Pooling works pretty much like convoluting, where we take a kernel and move the kernel over the image. Notice that we first applied a linear function between the kernel and the image window whereas the function now isn’t linear at all. Pooling can be of different types. In case of Max pooling, a sample-based discretization process we take the largest value from the window of the image currently covered by the kernel. The objective is to down-sample an input representation (image, hidden-layer output matrix, etc.), reducing its dimensionality to prevent overfitting.
As shown in the figure below, we slide our 2 x 2 kernel by 2 strides (our step size) and take the maximum value in each region.
Figure 4: Max pooling
2.2.3 Our neural network
18.104.22.168 Network structure
Let’s put into practice what we’ve learned so far.
The input of our neural network, which is shown in the figure below is a 200x200 sized image. At the beginning the image runs through two convolutional layers with a stride = 1, a kernel size of 3 and zero-padding to achieve a constant size of the convolutional process. The first two convolutional layers are followed by a max pooling layer with a kernel size of 2 and a stride of 2. This results on a size-reduction of the layer by a factor of 2. This process is repeated a few times with different numbers of filters. The final output of combination of the different max pooling and convolutional layers is a 6x6x512 matrix. And next? This matrix is passed on into two fully connected layers which use the ‘relu’-function as the activation function. The result of the fully connected layers, which are followed by a softmax function, is the predicted value of the bone age.
Figure 5: Neural network — Layers
22.214.171.124. Training Process
So now we are heading more into detail. The basis of our neural network is the VGG16-Net, which is used with the help of pre-trained weights of the ImageNet data set. The ImageNet dataset consists of 14 million different images, of 20000 different categories. The use of such pre-trained networks is becoming more and more widely used as it has the advantage of a faster learning process. Only the fully connected layer has to be adapted in order to adapt the network to our task. Due to the pre-trained weights, our neural net only needs to be trained over 80 cycles to achieve an acceptable accuracy.
2.3. Intensity map
Intensity or heat maps are a simple technique to get the interesting image regions used by a CNN to identify a specific class in the image. To get this heat map, the activation of the last convolutional layer is analyzed and applied on the input image. So why should we create such a heat map, if it doesn’t actually improve our results or accuracy? Simply speaking, because we are not thinking in a binary way — we are not computers. And in order to understand what our algorithms are doing it can be pretty helpful to visualize what they are up to. In this case, our intensity or heat map shows us pretty clearly, what the algorithm is interested in. And thus, it gives us the opportunity to better understand where the interesting parts lie or — if we know better — that our algorithm might be looking the wrong way. Curious? You can find the heat map in the results part (3.2).
3.1. Bone age prediction:
As shown in the figure below, that prediction of the patients’ age based on the x-ray images works. Hungry for a more convincing proof? A more detailed view on the whole results is shown in the graphic below the bone images. Here you can see the predicted bone age against the real bone age. The graphic shows a quick good linearity between both values. The mean error of the predict is 20 months which means that our neural network can predict patients’ age to within two years.
Figure 6: Prediction of the patients’ age based on the x-ray imagesFigure 7: Predicted bone age against the real bone age.
From this image it also becomes increasingly clear that our algorithm works best in the medium age part of our dataset — it underestimates especially high ages and overestimates the age of our really young fellow human beings. However, that shouldn’t surprise us since the dataset our model was trained on has less data for the more extreme age groups (as you can see by the number of the red dots).
As described thoroughly in the part above, a heat map can determine the interesting areas on which the neural network bases its predictions. The following graphic shows that the interesting areas for our neural network are the carpal bones. Interestingly, this goes along with an old and really well-established method used in forensic medicine. Doctors and scientists have long used carpal bones to assess the age of children or unknown suspects. Sure — all these pictures present the carpal bones to the algorithm. But still, isn’t it amazing that our little experiment reaches the same conclusion as human doctors decades ago? Namely, that if you want to assess the age of a child it makes sense to look at the different parts (epiphyses) of metacarpal bones and the ossification of carpal bones.
Figure 8: Heat map
4. Conclusion and Learnings
We are a very heterogeneous team and we all started our TechLabs tracks with different prior knowledge and varying expectations. Some of us had never heard anything about machine learning, others had experimented with these techniques before. But sitting down and talking to each other now it becomes ever clearer that we all learned something in these past few months, however different this may be in the individual case. Some of us can now read machine learning articles without breaking a sweat, others are comfortable using pre trained algorithms for their own data and research. For one of us, machine learning and neural networks have even found their way into his workplace, where he has started building first custom models to tackle his recurring work problems, that have puzzled him for way too long previously. We might not all make machine learning our living now — but improving our knowledge of this crucial technology at whatever level is helpful for better understanding today’s world and is a requirement for being part of a rapidly evolving conversation.
This project has also been really helpful in another way. Even though our algorithms have a completely different approach to solving the task than doctors do, they came up with similar solutions. As demonstrated in the heat map above, our neural network and doctors both decided to look at the same parts of the image for solving our task at hand. While AI may often come up with a way of solving problems, ways which we can hardly grasp, this has demonstrably not always to be the case. The solution at hand is an excellent example of how human minds and algorithms can sometimes validate each other — or possibly even work hand-in-hand, a scenario that we are now looking forward to.
5. Information about the project team
Everything from this blog and the entire library can be found in the following Github Repo.
Laura Gutewort: Master student of psychology at the University of Münster