
Machine Learning with PyTorch: Computer Vision Framework For Multi-Class Aerial Land Classification (Part I)
PyTorch stands as a cornerstone in the world of machine learning, renowned for its flexibility, dynamic computation graph, and seamless Python integration. It is distinguished from other frameworks due to its intuitive syntax and versatile computation capabilities, allowing for experimentation and rapid prototyping. Its automatic differentiation engine allows for efficient gradient optimization, which is instrumental in training neural networks. This framework has not only accelerated the pace of research in artificial intelligence but has also found extensive application in industries spanning from healthcare to finance, revolutionizing the way we approach and solve intricate machine learning problems.
Using such a powerful tool, I will be constructing what is ultimately a “bad model”.
Building machine learning frameworks with PyTorch gives users the option to select from hundreds of design decisions that are usually specified by the data being used. While a few hundred lines of code can create a successful neural network for simplistic computer vision and binary classification, I wanted to challenge both myself and the parameters of PyTorch to train it to select different types of land from aerial photographs. These photographs could have more than two classes (the land can be categorized in more than two ways), making it a “multi-class” classification problem. Also, since the images are in color (and the color is instrumental in categorization), this introduces additional complexity our model will have to account for.
Since the dataset being utilized is not native to PyTorch, nor is there related source code to pull from, this model will have to be fine-tuned across a number of parameters.
The best methodology for this kind of iterative experimentation involves creating a baseline model with minimal layers and standard parameters, and then changing an iterating from that baseline model to improve accuracy and minimize loss. This section will cover the creation of the baseline model and the design choices made that will be altered down the line.
So no fear, a better model is coming in Part II.
Convolutional Neural Networks
Since this neural network takes in image data (from the aerial photographs), the model will be utilizing a Convolutional Neural Network (CNN). PyTorch streamlines the pre-processing of raw pixel data, enabling users to more easily load, augment, and transform images. CNNs are a cornerstone of modern computer vision, and is particularly well-suited for the intricate operations involved in image processing tasks, allowing for efficient optimization during the training process.
A Convolutional Neural Network (CNN) is a type of deep learning model specifically designed for processing grid-like data, such as images and videos. It's a class of artificial neural networks that have proven highly effective in areas like computer vision tasks, including image recognition, object detection, and segmentation. There are some key components and concepts that define a Convolutional Neural Network:
Convolutional Layers:
The fundamental building blocks of a CNN. They apply a set of learnable filters (also called kernels) to the input data. These filters slide across the input spatially, performing element-wise multiplications and then summing up the results to produce feature maps.
Pooling Layers:
These layers are used to reduce the spatial dimensions of the feature maps while retaining the most important information. Common pooling operations include max-pooling and average-pooling.
Activation Functions:
Non-linear functions (e.g., ReLU, sigmoid, tanh) are applied after convolutional and fully connected layers to introduce non-linearity into the model, allowing it to learn complex relationships between features.
Fully Connected Layers:
These layers connect every neuron in one layer to every neuron in the next layer, similar to traditional feedforward neural networks. They are typically used towards the end of the network to make predictions based on the learned features.
Flattening:
The process of converting the multi-dimensional feature maps produced by the convolutional layers into a one-dimensional vector that can be fed into the fully connected layers.
Dropout:
A regularization technique used during training to reduce overfitting. It randomly drops a fraction of neurons during each training step, forcing the network to be more robust and generalize better.
Loss Function:
A function that quantifies the error between the predicted output and the actual target. Common loss functions for classification tasks include categorical cross-entropy and mean squared error for regression tasks.
Optimization Algorithm:
Techniques like stochastic gradient descent (SGD), Adam, RMSprop, etc., are used to adjust the weights and biases of the network during training in order to minimize the loss function.
Backpropagation:
The algorithm used to compute the gradients of the loss function with respect to the model's parameters. It's essential for efficiently training deep neural networks.
Training and Testing Phases:
The process of training involves feeding the network with labeled data, adjusting the weights using backpropagation, and iteratively minimizing the loss. Testing involves evaluating the model's performance on a separate set of unseen data to assess its generalization ability.
CNNs have revolutionized computer vision tasks and have been applied successfully in various fields including image recognition, medical imaging, natural language processing (when combined with techniques like 1D convolutions for text), and more. They are characterized by their ability to automatically and adaptively learn spatial hierarchies of features from data, making them highly effective in tasks that involve visual pattern recognition.
CNNs are also ideal given that this will be a multi-class classification problem. A multi-class classification problem in the context of machine learning and neural networks involves predicting one of several distinct classes or categories for a given input. Unlike binary classification, where the goal is to distinguish between two possible classes, a multiclass classification task entails assigning an input sample to one of multiple predefined classes. For instance, in a handwritten digit recognition problem, the task is to correctly identify digits ranging from 0 to 9. Each digit corresponds to a separate class, making it a multiclass classification problem. Neural networks, particularly architectures like convolutional neural networks (CNNs) are well-suited for tackling such tasks. The output layer of the neural network is structured to have as many nodes as there are classes, and a suitable activation function is employed to convert the raw network output into logits, indicating the likelihood of the input belonging to each class. The model is then trained using labeled data, adjusting its parameters through backpropagation to minimize the prediction errors and improve its accuracy in assigning inputs to the correct classes.
From a high-level, these are the steps to modeling with CNNs in PyTorch:
Load relevant libraries and process data
Create a CNN model with PyTorch
Picking a loss and optimizer
Training a model
Evaluating the model
Code
To start, we’ve got to load all the relevant libraries that are needed and give the appropriate aliases (such as np for numpy). Since this code was originally used in Google Colab, there may be some additional imports needed for other IDEs.
For the best ease of use, it is recommended to load the image data directly into google drive and interface with the drive from Colab. The code snippet above will have the user log into their Google drive account to allow for the data transfer. The data used for this experiment was UC Merced Land Use Dataset, which can be found here for download. There will be some pre-processing necessary and only some of these land categories will be used for the purpose of this model, but having a clean, labeled image dataset is a great start.