狠狠撸

Computer Vision and GenAI in
Geoscience
YOHANES NUWARA
PETROLEUM ENGINEERS ASSOCIATION (PEA)
Trondheim, 28.07.2024

Yohanes Nuwara
Career:
● Data scientist at Prores AS, Norway (2024-)
○ Computer vision for porosity and permeability prediction from core images
● Lead data analyst at APP Sinarmas, Indonesia (2022-2023)
○ Sustainability dashboard for management
● Expert data scientist at APP Sinarmas, Indonesia (2022-2023)
○ LiDAR, computer vision for remote sensing UAV
● Research engineer at OYO Corporation, Japan (2020-2022)
○ Distributed Acoustic Sensing (DAS) for earthquake seismology
Education:
● Politecnico di Milano, Italy (2023-)
○ Master’s in Business Analytics and Big Data
● Bandung Institute of Technology, Indonesia (2015-2019)
○ Bachelor’s in Geophysical Engineering

Outline
● What is computer vision?
● Computer vision methods and models
● Use case 1: Automatic rock typing using segmentation model
● Use case 2: Boulder detection for seabed mapping
● Challenges in computer vision
● What is Generative AI?
● Generative vision models
● Conclusion

What is computer vision?
Computer vision is a scientific field which task is to understand image based on its pixel
information using traditional image analysis and artificial intelligence
1
2
3
4

Why computer vision is so growing???
6
Rapid growth of computers
and hardware chips
Bigger, modern, and more
secure data storage
Rapid evolution of AI
computer vision models
More and more advanced optics and
camera technologies

Computer Vision in geoscience
Seismic interpretation Petrophysics
Geology Remote sensing

How computers see images?
● Computers see image as Pixels(unit of image that has 3 color channels: Red, Green, Blue)
● Color is represented as the value intensity of each color channel (0<Intensity<255)
● Therefore, image is a 3-D array → (Pixel width, pixel height, channels)
● In remote sensing, image can be composed of more than 3 color channels
Illustration of pixel representation on the LCD screen Color channels of an image
Pixel width
Pixel height

Types of computer vision tasks
Task Definition Example
Image classification Classify objects from two or more classes Classify malignant versus benign
tumor from images
Image regression Predict a value from image Predict the price of car from
images
Object detection Locate the object on an image (bounding
box)
Locate ripe fruits on the tree from
images
Object segmentation Segment the boundary of an object Segment crack on roads from
images
Keypoint (or pose)
detection
Identify the components of an object Identify parts of an animal body
from images

Convolutional Neural Network (CNN)
CNN is a type of neural network that can process images (N-dimensional arrays) utilizing hierarchical layers of
interconnected neurons and convolutional operations to automatically learn and extract features from images and
perform identification tasks on image.
11
Fully Connected Layer
Dense neural network -
Make prediction based on
features
Conv layer
Learn features from
image
Pooling layer
Reduces feature maps
and spatial dimension
Flatten layer
Convert N-dimensional
output to 1-dimensional
1 2 3 4
1 2
3 4

Transfer Learning
● Transfer learning is concept in neural network that allows to “re-use”
available models, train on our use case, and fine tune
● Transfer learning models already have pretrained weights
12
Residual Net (He et al, 2015) Inception Net (Szegedy et al, 2016)
VGG 16 (Liu et al, 2016)
Mobile Net (Howard et al, 2017)

State-of-the-Art (SOTA) models
● Computer vision SOTA models are combinations of convolutional networks
● Generally, SOTA models have 3 components : Backbone, Neck, Head
● Popular SOTA models: Faster RCNN (2015), Mask RCNN (2017), and YOLO (2015-2024)
Backbone Head
Neck
Processes image input
Learn through DCNN
Produces feature map
Generate region
proposal
Localize object
Bounding boxes
Segment

Segmentation Models
Detectron2 (Facebook/Meta, 2019) Segment Anything Model (Facebook/Meta, 2022)
14
U-Net (Ronneberger et al, 2015) Mask R-CNN (He et al, 2017)

Detection Models
Mask R-CNN (He et al, 2017) YOLO Models (2015-)
15
Template Matching (Graf and Zisserman, 1988) Faster R-CNN (Ren et al, 2015)

Keypoint Detection
● Objects consists of a predefined set of keypoints and connections between them
● Very popular in human movement analysis (pose detection)
● Popular model such as the 8th version YOLO (YOLOv8 Pose)
“Objects as keypoints” Human movement analysis
(Source: OpenVino)

Use case 1: Automatic rock
typing from core

Core image interpretation
Drilling activity
Core sample
Lithology description done by
petrophysicists
● Drilling core presents geological evidence that is used to find the oil in the rocks underneath
● The drilling core is then brought to the lab to be analysed
● Lithology description is done by petrophysicists in the lab
● It’s a very lengthy process!

Automatic rock typing from core image?
Instead of human conducting lithology description, how about teaching Neural Networks to
describe the lithology (later can be supervised by humans) ?

Labelling and annotation
500 images are carefully segmented by different classes of lithologies, namely: Bioturbated
mudstone/sandstone, Massive mudstone/sandstone, Parallel-laminated mudstone/sandstone, Cross-
bedded/graded-bedded sandstone, Current-rippled sandstone, Conglomerate, Fissile shale, and Heterolithic

Distribution of lithology classes
Samples are too few
● Imbalance between number of instances of classes can severely affect the performance of computer
vision model
● Imbalance makes high accuracy biased to class which has more instances than the others

Data augmentation
●A strategy to solve imbalanced class
is called data augmentation
●Data augmentation consists of
different manipulations of image by
rotation, flipping, and color space
shifting
●Augmentation can also be used to
improve the model generalization by
training model on different image
conditions
Note: Only some augmentations are useful for
particular use case.
? In core facies segmentation where color is
important, the red box cannot be used
WHY ?????

Model training
(x1,y1)
(x2,y2)
(x3,y3)
(x4,y4)
(x5,y5)
(x6,y6)
(x7,y7)
(x8,y8)
(x9,y9)
Annotating class 3 (Par. lam. sandstone)
3 x1 y1 x2 y2 x3 y3 …
● To form training data, polygons (segment) need to be converted to numerical representation
● Following workflow for conversion → YOLO format
● Train, validation, and test split 75%-15%-10%

Model evaluation
● How good and accurate our model is?
● Important metrics for instance
segmentation:
○ Classification metrics: Precision,
Recall, F1-score
○ Loss: Dice loss, IoU loss
○ Accuracy: Mean Average Precision
(mAP50)
● Confusion matrix to show False
Positives and False Negatives of model
result

Segmentation result
● After the best model is achieved,
model is used to predict lithologies
from core image
● Inference time is very fast ->
milliseconds per core image
● Can be used as “Quick QC” for
petrophysicists to review the
automatic rock typing result
Original core image Segmented core

Use case 2: Boulder detection for
seabed mapping

Seabed mapping for offshore oil
infrastructure
● Offshore oil rig or infrastructure need careful planning for its structural stability
● Side scan sonar survey is used to map the structure of the seabed and identify obstacles such as
boulders
Side scan sonar capturing the Port (P) and Starboard (S)
side of the seabed mapping

Keypoint Model for Seabed boulders
● Boulders are big rocks sedimented on the seabed, with dimension Length, Width, and
Height
● The Length, Width are calculated as Length (oL) and Width (oW) of object
● How tall is the boulder???? → calculated from Shadow (oS)
● Keypoint representation → Object as head, while shadow as tail of keypoint
Boulder center
(Head)
Shadow center
(Tail)
(Savini, 2010)

Boulder Keypoint annotations
Port (P) position Starboard (S) position

Model Training
● Model: YOLOv8-Pose
● Training took 50 minutes with NVIDIA
GPU T4 (200 epochs)
● Accuracy improvement in 2 ways:
○ Augmentation
■ Cropping and zooming
■ Horizontal flip
○ Hyperparameter evolution
■ 50 iterations of hyperparameter search
■ Searching hyperparameter with the best
accuracy
■ Minimize the loss curve
Multiple experiments done during search of optimum
hyperparameters
YOLOv8-Pose architecture (Wang et al,
2024)

Boulder detection result
P position
S position

Generative adversarial networks (GAN)
● Generating (synthetic) image which has never exist, based on image input
● Pioneered by Ian Goodfellow (2014)
Image to Image

GAN architecture
● GAN consists of Generator and Discriminator
○ Generator: Generate fake image that resembles input image
○ Discriminator: Judge if the generated image is fake/real
● Continues training until discriminator cannot distinguish between fake and real
Image to Image
Generator
Discriminator

Applications of GAN
Image to Image
Reconstruction of 3D model of tight sandstone
(Zhao et al, 2021)
Outcrop to seismic generation

Diffusion models
● Generating (synthetic) image from text input by human
● Examples: DALL-E by OpenAI, Imagen by Google
Text to Image

Diffusion model architecture
Text to Image

Vision transformer (ViT)
● Generating texts or tasks based on image input by human
● Sample tasks:
○ Generating captions from image
○ Locate object in the photo
○ Question and answering based on photo
Image to Text

Applications of ViT
Image to Text
Identifying mineral from thin section
Prompt: What minerals are in this sandstone?
Locating fault from seismic image
Prompt: Where are the faults in this seismic image?

Challenges in computer vision
My paper in Springer’s Lecture Notes on Computer Science (2024)

Image quality issues
●Image can suffer from quality issues, for example
○Resolution reduction: blurred image due to camera movement or haze
○Occlusion: shadowed image due to object obstacles blocking the light
○Over-exposure: appearance looks too bright due to excessive light exposure
○Colour constancy: false colour of image tendency towards a certain colour
Shadow Overexposure Yellow constancy
(Nuwara and Trinh, 2024)

Model generalization issues
●Most model is trained on image with ideal quality
●When tested on image with quality issues, model performance reduces
●Model cannot generalize on image with different quality issues
Image with ideal quality Image with shadow
Performance
degradation

Histogram Matching
● Histogram matching is an algorithm to
transform the image based on its
histogram
● Steps:
○ Select a normal image as Reference
image
○ Extract the histogram of Reference
image
○ Select the image that needs to be
transformed (as Source image)
○ Extract the histogram of Source image
○ Match the histogram of Source →
Reference
● Result: Improved quality and balance of
lighting

Model Stacking
Model stacking is used to boost the performance of object detection model on low quality images by
combining 2 or more models to balance the performance of each model

Improved result on low-quality images
Shadowon object
Light exposure
BEFORE AFTER

Conclusion
● Computer vision makes huge impact in broad areas of geoscience
● Two use cases are presented using segmentation and object detection workflows
● Generative AI shape the future of AI implementation in geoscience

狠狠撸

Computer Vision and GenAI for Geoscientists.pptx

More Related Content

Computer Vision and GenAI for Geoscientists.pptx