Data Analysis in SeaBee
Artificial Intelligence (AI) and machine learning are discussed widely – especially regarding the role as daily tools in our society. Within research, various sorts of machine learning and AI methods have been in place for some time. However, there are always new ways to apply these tools within research. We implement some of these new applications in the SeaBee Research Infrastructure.
We use AI and machine learning extensively within SeaBee, and for good reason. AI often provides excellent performance for a wide range of vision-based tasks, where specific objects or features in images need to be identified. The data analysis in SeaBee focuses on implementing state-of-the art AI methodology for object detection and mapping of relevant variables (for example, types of kelp, benthic habitats, seals or seabirds).
Arnt-Børre Salberg (Norge Regnesentral) is leading the Data Analysis work, in partnership with NIVA, NINA, NTNU, IMR, and SpectroFly (About SeaBee).
Data analysis pipelines in place so far
The data analysis pipeline works in two modes: training and inference. In the training mode, we teach an AI-model from training data. In inference mode, we apply the taught AI-model to perform inference on new drone data. The pipeline is sensor independent and works on RGB, multi-spectral as well as hyper-spectral images. It is neither tuned to specific nature types or animal species. It learns from what is specified in the training data.
” The biggest achievement in our Data Analysis work so far, is the first version of data analysis pipeline for both thematic mapping of coastal habitats and detection of animals”
– Arnt-Børre Salberg, NR
How the pipeline works
Drones, flown by SeaBee pilots, collect images and data along the coast of Norway during field missions. The drone data are uploaded into the SeaBee Research infrastructure and undergo pre-processing approaches, such as orthorectification and image stitching. The drone images are then fed into an AI-model that automatically analyses the images. The AI model has been ‘trained’ using training data to recognise which pieces of drone data are needed by the scientists.
The training data
In SeaBee, the training data for the AI models consists of two parts:
- The drone image itself
- The annotation supporting the content in the drone images, e.g., bounding boxes locating birds, with corresponding information about species, sex, age, etc., or polygons specifying habitat species for some area.
The annotation is created by people sitting and drawing bounding boxes or polygons around the objects or areas of interest in the image. It is often time consuming, especially if there are for example, hundreds of birds in many images.
However, once enough training data is collected, the AI models can be trained and implemented into the SeaBee model registry. Then the AI models can analyse new drone images – making the analysis much faster and efficient.
Advantages and limits of using Artificial Intelligence
Advantages
The main benefit of applying AI in coastal management is the efficiency gained in analysing image data, and the ability to scale the analysis pipeline to address any data stream.
The AI methodology is data-driven and the method is quite generic. Therefore, it can be applied to solve any mapping question.
Current Limits in SeaBee
For Arnt-Børre, the next priority is to address the issue of brittleness that often occurs in AI models. Brittleness occurs when the environment changes in such a way that the computer vision algorithm can no longer recognise the object due to some small perturbation.
Brittleness in SeaBee happens when the AI algorithm is not able to cope with new data acquired under slightly different weather conditions, illuminations, or with a new camera. A common response to such brittleness is to gather more training data, to fill what is thought to be a perceptual gap. This is the strategy we mainly follow in SeaBee as well, as drones can collect more good quality training data very efficiently. We expect the AI algorithms to improve in performance as more and more drone images are collected, annotated, and applied as training data.
Successful implementation of state-of-the-art machine learning algorithms in SeaBee is dependent on graphics processing units (GPUs) and fast-working storage provided by the UNINETT/Sigma2 high performance computation infrastructure.