Machine Learning in Geoscience

The application of machine learning in geoscience is an evolving field. The ML approach is concerned with the development of algorithms for learning from machine-readable data. ML encompasses several domains, including data mining, difficult-to-program applications, and software applications.

More capable robots and advanced algorithms can now independently execute various activities previously believed to be impossible to complete without any human interaction.

For example, without explicit instruction, machine learning algorithms can learn to accomplish challenging tasks like speech, face, and object detection or to play and even outperform the best human players in the classic game of Go. The massive amount of available data feeds these algorithms.

Introduction to Machine Learning in Geoscience

Many data-intensive scientific domains, especially those related to Earth Sciences, require a growing proficiency in machine learning. Researchers have discovered that machine learning performs better in earth science than conventional statistical models, particularly when it comes to defining geologic facies, predicting climate-induced range shifts, and characterizing forest canopy structures.

The necessity for innovative data processing and assimilation techniques that can take advantage of the knowledge derived from this data explosion is highlighted by the fact that datasets are expanding in size and variety in many sectors of geosciences at an extraordinarily fast rate.

The application of machine learning techniques has the potential to advance the methods utilized in several geoscience sectors for data processing. We suggest a summer school in this context that focuses on applying machine learning techniques to geophysical, geological, and environmental data.

Machine Learning in Remote Sensing

There is growing interest in utilizing machine learning in the geoscience field to analyze remote sensing data automatically and improve our comprehension of intricate environmental systems.

Satellites and other airborne equipment are used in environmental remote sensing to gather information about the environment. This method generates enormous amounts of data, and environmental remote sensing is today regarded as a Big Data application. For more effective data analysis, experts are employing artificial intelligence (AI) methods like machine learning.

Machine learning algorithms minimize the amount of human intervention by enabling a system to learn from data and experience without being specifically designed. This data-driven methodology allows for the extraction of useful information about a natural event solely from the data. Benefits include the ability to manage more complicated environmental data, but drawbacks, such as data accessibility.

Machine Learning in Geological Mapping

Numerous Earth science applications, including environmental monitoring and mineral extraction, benefit from the use of remotely sensed spectral images, geophysical (magnetic and gravity), and geodetic (elevation) data. Preliminary geological mapping and interpretation may be improved by combining this data with Machine Learning Algorithms (MLA), which are frequently employed in image analysis and statistical pattern identification applications. Compared to traditional field expedition approaches, this method advances geological mapping quickly and objectively.

The Tellus initiatives in Northern Ireland and southwest England have thus far served as examples of the recent trend toward the collection of ever-more quantitative data. These programs involved a series of surveys that combined ground and aerial surveys to collect data on both geochemical and geophysical processes.

These multidisciplinary surveys are beautiful and useful because they handle the huge, multivariate datasets as a unified system, a completely quantified depiction of the geo-environment as a whole, rather than separately analyzing each of the numerous independent factors.

Since chemical composition directly quantifies what a rock or soil actually is, and so provides an appropriate data format through which to translate conventional mapping operations into the digital world, machine learning has,  thus, far, been mostly utilized to model and predict geochemistry.

However, we will need to supplement our data systems with measurements of a larger range of geo-properties as interest in developing an evidence-based understanding of the behavior of the rocks beneath our feet rises. For instance, in order to provide trustworthy decision-support tools for a larger user base, we can include data on mechanical qualities, hydrological properties, and so on, in addition to data on chemical composition.

For the time being, machine learning is successfully enabling us to convert multimodal aerial survey data into precise, high-resolution maps of the chemical composition of the rocks and soils that are the basis of our society.

Machine Learning Methods in Geoscience

(i) Geological Mapping and mineral and Mineral Prospectivity mapping

Maps displaying geological characteristics and geological units are produced through geological mapping. In order to create maps specifically designed for mineral exploration, mineral prospecting mapping uses a range of information, including geological maps, aeromagnetic images, etc.

Using spectral images derived from remote sensing and geophysical data, machine learning algorithms can be used to process the data for geological and mineral prospecting mapping. While conventional imaging only records the electromagnetic spectrum’s three wavelength bands (Red, Green, and Blue), spectral imaging uses a number of different electromagnetic wavelength bands.

When dealing with remote sensing geophysical data, algorithms like Random Forest and Support Vector Machine (SVM) are frequently used. In contrast, Convolutional Neural Networks (CNN) and Simple Linear Iterative Clustering-Convolutional Neural Network (SLIC-CNN) are frequently used when working with aerial photographs and images.

It is possible to carry out large-scale mapping using geophysical data from airborne and satellite remote sensing and smaller-scale mapping using images from Unmanned Aerial Vehicles (UAV) for higher resolution.

The figure below shows the evolution of machine learning techniques in geoscience in the last 70 years.


machine learning evolution in geoscience
Dramsch, J.S. (2020). 70 years of machine learning in geoscience in review. Advances in Geophysics, 61, 1 – 55.

(ii) Landslides susceptibility and hazard mapping

The local terrain conditions determine the probability of a landslide in a particular location. Therefore, landslide susceptibility mapping can identify areas at risk of landslides, which is useful for urban planning and disaster management. Input datasets for machine learning algorithms typically include topographic information, lithological information, satellite images, and so on, with some studies including land use, land cover, drainage information, and vegetation cover.

Training and testing datasets are required for machine learning training for landslide susceptibility mapping.

Both randomly dividing the study area for the datasets and dividing the entire study into two neighboring regions for the two datasets are techniques for distributing datasets for training and testing. Common practice calls for randomly splitting the research region into two datasets to test the classification models, but splitting it into two neighboring areas will allow the automation algorithm to map a new area using expertly processed data from the surrounding land.

(iii) Discontinuity analysis

Engineering has significant repercussions for discontinuities like fault planes, bedding planes, etc. Even when interfering items are present, such as foliation, rod-shaped plants, etc., rock fractures can be automatically identified by machine learning through photogrammetric analysis. Data augmentation, a typical approach to prevent overfitting and expanding the training dataset, is used in machine learning for identifying photos.

For instance, in a study to identify rock cracks, 68 photos were randomly split into the training dataset and 23 images into the testing dataset. Next, data augmentation was used to flip and randomly crop 8704 more photographs into the training dataset. In the majority of cases, the method was able to identify the rock cracks effectively. Furthermore, both the Specificity and the Negative Prediction Value (NPV) exceeded 0.99. This illustrated how reliable machine learning-based discontinuity analyses are.

(iv) Carbon-dioxide leakage detection

The public is increasingly interested in knowing whether carbon dioxide is securely and successfully stored underground. Hence measuring carbon dioxide leakage from a geologic sequestration site has gained attention. At a location known as a geologic sequestration site, greenhouse gases will be absorbed and buried using underground geological formations.

Carbon dioxide leakage from a geologic sequestration site can be found indirectly through the planet’s stress response using remote sensing and an unsupervised clustering technique (the ISODATA approach). By suppressing plant respiration as oxygen is driven away by carbon dioxide, the increase in soil CO2 concentration stresses the plants. The Red Edge Index can be used to identify the stress signal from vegetation (REI).

The unsupervised technique clusters pixels with comparable plant responses to process the hyperspectral pictures. The hyperspectral data in regions with known CO2 leakage was gathered in order to match sites with CO2 leakage with grouped pixels with spectral anomalies. The method effectively detects CO2 leakage; however, several drawbacks need more research.

For example, some stressed pixels were mistakenly labeled as healthy pixels because the Red Edge Index (REI) may not be accurate because of factors like greater chlorophyll absorption, variations in vegetation, and shadowing effects. The vegetation’s stress response to CO2 may also be influenced by seasonality and groundwater table height.

(v) Measurement of water influx

The Rock Mass Rating (RMR) System is a globally used geomechanical method for classifying rock masses that requires the input of six factors. One of the inputs to the categorization scheme, which represents the state of the groundwater, is the amount of water inflow. The measurement of the water intake in the faces of a rock tunnel has traditionally been done in the field by eye inspection, which is labor-intensive, time-consuming, and dangerous.

Machine learning can ascertain the water intake by examining photos captured at the construction site. The classification of the approach largely adheres to the RMR system, but it also combines damp states because it can be challenging to tell the difference just by visual inspection. The pictures were divided into categories: undamaged, wet, dripping, flowing, and gushing.

(vi) Soil Classification

Cone Penetration Testing is the most common and affordable method for investigating soil (CPT). A metallic cone is pressed into the ground to perform the test, and the force required to do so at a predetermined rate is recorded as a quasi-continuous log.

Cone Penetration Test log data can be used to classify soil using machine learning. The segmentation and classification components of the tasks are necessary to analyze the data in an attempt to categorize using machine learning. The Constraint Clustering and Classification (CONCC) algorithm can be used to partition a single series of data into separate groups.

Support Vector Machines (SVM), Decision Trees (DT), and Artificial Neural Networks (ANN) can all do the classification task (SVM). It can be seen from a comparison of the three methods that the Artificial Neural Network (ANN) fared best for classifying humous clay and peat.

In contrast, Decision Trees excelled in clayey peat classification. Even for the most difficult problem, this method may classify objects with a high degree of accuracy; in this case, 83 %, and the mistakenly identified class belonged to a nearby geological region. The accuracy of such an approach can be regarded as 100 %, given that such precision is sufficient for the majority of specialists.

(vii) Earthquake early warning systems and forecasting

Early warning systems for earthquakes are frequently susceptible to impulsive local noise, which results in false alarms. Dealing with earthquake waveforms from noise signals using machine learning techniques can prevent false alarms. The approach is divided into two stages.

The first stage involves unsupervised learning that utilizes the Random Forest to identify P-waves. In addition, the Generative Adversarial Network (GAN) is also used to identify and extract attributes of initial arrival P-waves. The method recognized P-waves with a precision of 99.2% and avoided false triggers from noise signals with a 98.4% accuracy.

Laboratory simulations of earthquakes are produced to mimic earthquakes that occur in the real world. Without the need for manual searching, machine-learning approaches for geoscience can be used to identify the patterns of acoustic signals that serve as earthquake precursors. In a study using continuous auditory time series data collected from a fault, it was proved that it was possible to predict the amount of time before failure.

The used algorithm, Random Forest, did a great job at predicting the remaining time till failure after being trained with about 10 slip incidents. This study discovered one of the auditory signals used to anticipate failures. Even though the earthquake created in the lab is not as sophisticated as the one on Earth, this represents a significant advancement that will help direct future research on earthquake prediction.

Machine Learning Application Challenges in Geoscience

(i) Black-box operation

As there are no known relationships or descriptions of how the results are generated in the hidden layers, many machines learning techniques, including Artificial Neural Network (ANN), is thought of as using a “black box” approach. Users can see the specifics of the algorithm using a “white-box” method like a decision tree. Such “black-box” approaches are ineffective for examining relationships. However, “black-box” algorithms typically offer greater performances.

(ii) Minimal data entry

Some tasks that seem simple for humans to do may be difficult for machines to complete using machine learning. For instance, the damp and the wet surfaces are not easily classified using machine learning in the quantification of water intake by rocks on the sides of the tunnels using photographs for the Rock Mass Rating system (RMR). This is possibly because it is impossible to distinguish between the two merely by visual examination. Machine learning might not be able to replace human labor in some tasks.

(iii) Inadequate training data

Machine learning requires a sufficient amount of training and validation data. Nevertheless, some extremely helpful products, such as satellite remote sensing data, only contain decades’ worth of data going back to the 1970s. Less than 50 samples are available if one is interested in the yearly data.

(iv) Vegetation cover trends

According to various studies, vegetation cover is one of the major challenges for geological mapping using remote sensing, both on a large and small scale. This is because vegetation degrades the spectral image quality and obscures the rock information in aerial images.