Christin et al, MEE 2019 Common questions about how and when to use deep learning.

Wearner et al, Nature AI 2019 Artificial intelligence (AI) promises to be an invaluable tool for nature conservation, but its misuse could have severe real-world consequences for people and wildlife.

Brodick et al, TREE 2019 A walkthrough of how to use CNN for ecological applications.

Lamba et al, Curr.Biol.2019 Current and future applications of supervised deep learning in environmental conservation.

Christin et al, MEE 2020 Deep learning model testing and verification.

Haibe-Kains et al, Nature 2020 On the importance of transparency and reproducibility in artificial intelligence (in medicine).

Suresh et al, arxiv 2020 Identifying issues that commonly arise in ML: Historical bias, Representation bias, Measurement Bias, Aggregation bias, Evaluation bias and Deployment Bias

Hoye et al, biorXiv 2020 Deep learning and computer vision will transform entomology.

Corcoran et al, MEE 2021 Automated detection of wildlife using drones: Synthesis, opportunities and constraints.

Weinstein et al, JAE 2017 (old) Brief primer on ecological computer vision to outline its goals, tools and applications to animal ecology.

Baraniuk et al, PNAS 2020 Special issue in PNAS "The Science of Deep Learning"

Geirhos et al, Nature AI 2020 How CNNs learn "shortcuts" that are not generalizable to new images. An enormous issue to tackle.

Camera traps & animal species identification

Tabak et al, MEE 2018 ResNet-18 architecture and 3,367,383 images to automatically classify wildlife species from camera trap images obtained from five states across the United States. Datasets and R package MLWIC available (see next sections).

Tabak et al, Eco. Evol. 2020 Updated version of the previous paper, with more variation in the backgrounds associated with each species. With the “empty-animal" model that determines if an image is empty or if it contains an animal. 3 million camera trap images from 18 studies in 10 states in USA.

Norouzzadeh, PNAS 2018 Alexnet, VGG, GoogLeNet, ResNet. Identify, count, and describe the behaviors of 48 species in the 3.2 million-image Snapshot Serengeti dataset. Detect empty images + identify species + count animals.

Beery et al, arxiv 2018 Inception-v3 [56] model pretrained on ImageNet. Faster-RCNN model using two different backbones, ResNet-101 [58] and Inception-ResNet-v2. State-of-the-art algorithms show excellent performance when tested at the same location where they were trained. However, generalization to new locations is poor.

Schnieder et al, arxiv 2019 ResNet-101 architecture + Faster R-CNN outperforms YOLOv2.0 on camera trap images using the Reconyx Camera Trap and the Snapshot Serengeti data sets.

Willi et al, MEE 2018 ResNet-18 to identify species on four datasets (Table 1). Shows the differences of performance between training from scratch and transfer learning. (Fig 3,4)

Gomez-Villa et al, Eco Info 2017 Seregenti dataset analyzed with AlexNet, VGGNet, GoogLenet and ResNets.

Chen et al, Eco Evol 2017 AlexNet based. A dataset consisting of 8,368 images of wild and domestic animals in farm.

Ahumada et al, Env. Conv. 2019 Present Wildlife Insights, a platform that can process classification on your images automatically. Use pretrained Inception model, finetuned with 18 millions of images from partners and will be retrained when new data come in. Not available yet, data will be published with the platform.

Falzon et al, preprint 2019 ClassifyMe is a software for automated animal detection on camera trap images. The user can download 5 different models trained on differents dataset and run the detection on its own machine. Use YoloV2 and won't allow re-train on own dataset. Not available yet.

Norouzzadeh, MEE 2020 Use active learning : after the training the model can access a bank of unlabeled images and ask to annotate some specific images for another training. It also uses Faster-R-CNN to find animals, the image are then cropped to remove the background and finally classified with ResNet-50.

Schneider, Eco Evol 2020 DenseNet201, Inception-ResNet-V3, InceptionV3, NASNetMobile, MobileNetV2, and Xception. Parks Canada dataset containing 47,279 images collected from 36 locations with 55 animal species. Classifications with <500 images had low recall + classifying species from untrained locations were less accurate

Sahinfar et al, Eco. Info. 2020, Discussion about high level of image similarity (reduce the CNN performance) and the number of images to achieve good performance (less than 300 image / class, poor performance). Data from Australia, Serengeti and Wisconsin.

Villon et al, Sci. Reports 2020 Propose to apply a post-processing step on the CNN outputs in order to accept or reject its classification decision. Tuning a risk threshold specific to each class using a second and independent database.

Beery et al, arxiv 2018 A pipeline that takes advantage of a pre-trained general animal detector. Paper along with Microsoft AI for Earth's MegaDetector code.

Whytock et al, MEE 2021 Classifies 26 Central African forest mammal and bird species (or groups). Trained on a small dataset (300,000 images) but generalizes to fully independent data. We identified three primary sources of error: over-exposed images, under-exposed images, mis-labeled images. An iterative approach: training, validation, error correction and model updating. Using ResNet50 in the Python library.

Yang et al, Ecology and evolution Ensemble learning to detect empty images. 135 camera traps from Lhasa mountains. Using Alexnet + Inception + ResNet and a system of votes.

Videos / sequence data:

Beery et al, arxiv 2020. Context-RCNN: adding a memory bank from context features enhances object detection. Rely on a pre-trained single frame Faster R-CNN with Resnet-101 backbone.

Shashidhara et al, arxiv 2020 Using sequence Information in camera traps image to improve identification performance. Three methods including background substraction and a LSTM model.

Conway et al, Ecosphere 2019 An approach that combines a standard CNN summarizing each video frame with a recurrent neural network (RNN) that models the temporal component of video. Used for salmons and penguins. The combined RNN-CNN led to a relative improvement in test set classification accuracy over an image-only model of 25% for penguins.

Animal species identification

Waldchen, MEE 2018 A focus on deep learning neural networks as a technology that enabled breakthroughs in automated species identification

Parham et al, Proc. 2018 A 5-component detection pipeline for use in a computer vision-based animal recognition system. Using YOLO.

Villon et al, Eco. Info. 2018 GoogLeNet architecture. Identification of fish species on underwater images. Try differents dataset for the training (with and without parts of fish, with and without the environment around fish). The average accuracy is the same for all datasets but there are difference between species. Also add decision rules after the training to improve performance.

Huang et al, Neurocomputing 2019 Tests the effect of 3 data-augmentation methods on underwater images: turbulence simulation, perspective transformation and illumination simulation. It improves the results more than standard data augmentation.

Milosevic et al, SciToEnv 2019 Uses ResNet50 to classify 10 different species of larvae with very good accuracy. Also uses GradCam which reveals relevant informations of what is used by the model.

Terry et al, MEE 2019 Improving performance of a CNN using contextual information (location, date) to identify 18 ladybird species. Using the R package Keras

Arje et al,MEE 2020 Robot machine + classification of 12 insects species with InceptionV3. Biomass prediction with a mixed linear model from image features.

Hansen et al, Eco Evo 2020 Inception-v3 model on an image database of 65,841 museum specimens comprising 361 carabid beetle species. Prediction at species and genus level.

Spiesman et al, Sci. Rep. 2021 Images from BugGuide were identified by expert naturalists. Initial dataset comprised over 120,000 images belonging to 42 species. A minimum of 150 images for a species to be included in the analysis, but error rates decreases with number of images per species. Using InceptionV3.

Pollen / plant / wood

Abrams et al, Ecoinfo 2019 Microscope slides, automatically extracting images of all individual pollengrains. Training with 122,000 pollen grains, from 347 flowers of 83 species of 17 families. Validating against pollen grains from bumblebee samples. MATLAB image toolbox + ResNet-18

Mäder et al, MEE 2021 The Flora Incognita app – interactive plant species identification. CNN + deep feedforward network that uses location embeddings and similarity learning for predicting likely species at a given location.

Hwang et al, Plant Methods 2021 Reviews of workflows of CV-based wood identification systems.

Schiller et al, Sci. Rep. 2021 Predicting plant traits using parallel branches to incorporate the different input data types. A branch processing the bioclimatic data consisted of dense layers + a CNN processing the images + concatenating the two branches.

Segmentation/Masking (i.e. detecting exact contours)

James et al, MEE 2020 Detecting plant species from drone image. Semantic segmentation using U-Net. The trained model was integrated to interact with the drone using Android technology.

Abrams et al, Ecoinfo 2019 Based on the U-Net. Segmenting habitat images of tropical rainforests. Trained with 800 canopy images and 700 understory images.

Brodick et al, TREE 2019 Segmentation of coral reefs.

Bayr et al, Eco. Info. 2019 Classification of woody vegetation using a homemade CNN. Classify each pixel in 50x50 images.

Wu et al, Nature Com 2019 Removing background using mask segmentation. U-Net outperformed Mask R-CNN

Guirado et al, Remote Sensing 2020 Estimating tree cover with a Inception v3 CNN. Images from Google Maps corresponding to the FAO’s GDA 0.5 ha forest and non-forest plots + the Northwestern Polytechnical University NWPU-RESISC45 dataset [48], a set of publicly available reference orthoimages for the classification of remotely sensed images.

Kattenborg et al, Sci. rep. 2019 Segmentation with U-net. Mapping vegetation communities

Song et al, Sensors 2021 Semantic Segmentation to delineate coral . Using ResNet34 as a skeleton network, the proposed model extracts coral features in the images and performs semantic segmentation.

Face recognition / Re-Identification (Re-Id)

Schofield et al, Sci. Adv. 2019 Chimpanzee face recognition from videos in the wild using deep learning. Annotation with VIA software. Detection with SSD with VGG-16.

Hansen et al, Comp. Ind. 2018 Pig-face recognition with a VGG-face model and a home-made 6-layers CNN implemented in Keras, using data augmentation (image rotation). Grad-CAM (class-activated mapping) shows what regions of an input image are activating the network for a given class.

He et al, arxiv 2019 Red pandas face recognitions in 3 steps : face detection with YoloV2, face alignement with U-Net and face identification with VGG-16. They used 2877 images of 51 pandas. 93% of top 1 ranking and 91% without face alignement.

Schneider et al, MEE 2019 Summary of past approaches for re-ID (re-identify an individual animal upon re-encounter). Presents different deep learning methods to do it : CNN and Siamese Network, as well as different metrics : verification, close-set and open-set identification. Gives recommendations to make a dataset for re-ID.

Bogucki et al, Cons. Bio. 2019 A 3-CNN pipeline for whale Re-ID : a first CNN to find and crop the head of the whale, a second CNN to find 2 keypoints, used to orient the head and re-crop more precisely, and a third CNN to classify and identify the whale.

Ferreira et al, MEE 2020 Re-Id on 3 small birds species (10,10 and 30 individuals). Masks of birds are extracted with Mask-RCNN and then classified with VGG-19 to predict identity . 800+ pictures/individual are used for VGG19 training. They use data augmentation and also add blur/noise to be closer to the test set. Achieves around 90% accuracy for the 3 species but doesn't work well with new birds.

Körschens et al, arxiv 2018 Automatic Identification of Elephants. Yolo is used to crop the head. ResNet50 was modified to extract features not from the last layer before the classification layer, but from earlier activation layers + PCA to reduce dimension + SVM to classify.

Moskvya et al, arxiv 2019 Re-id of manta rays. The network is optimized using the semi-hard triplet loss function, dapting FaceNet. The distance between the learned embedding points provides a dissimilarity measure.

Bouma et al, arxiv 2019 Dolphin identification using batch-hard triplet loss function. Imbalanced dataset containing 3544 images of 185 individuals. Based on ResNet-50, with an output layer of size 128.

Schnieder et al, IEEE 2019 Re-id of humans, chimpanzees, whales, fruit flies, and octopus. Five siamese similarity comparison networks based on the AlexNet, VGG-19,DenseNet201, MobileNetV2, and InceptionV3.

Miele et al, MEE 2021 Giraffe re-id using triplet loss and CNN + clustering in image similarity networks.

Shi et al, Inter. Zoology 2020 Individual identification of 40 tigres. The number of im- ages collected from each tiger was approximately 200 (!!!). Using a 9-layer deep CNN developed in Keras.

Charpentier et al, Science Adv 2020 Computing face similarities between mandrills, using transfer learning with VGGFace and a SVN classifier. About 16,000 portrait images of 276 different mandrills.

Chen et al, Eco Evo 2020 Panda face recognition algorithm: segmenting into facial regions with ResNet + six neurons corresponding to six affine transformation parameters, which are used to align the segmented panda + classification with ResNet.

Clapham et al, Eco Evo 2020 Facial recognition of brown bears. 4,675 images of 132 individuals. FaceNet appraoch: (a) Face detection, (b) Face reorientation and cropping, (c) Face encoding (embedding), and (d) Face classification. ResNet-34 model and similarity metric using DLib + linear SVM.

Vidal et al, arxiv 2021 Review about image-based identification. Sumarizing works on Panda, Tiger, Ray, Whale, Dolphin, Seal, Chimpanzee and Brown bear.

Kulits et al, arxiv 2021 ElephantBook: A Semi-Automated System for Elephant Re-Id. Faster R-CNN to detect ears + Matching Ear Contours with CurvRank.


Torney et al, MEE 2019 Count wildbeest on aerial images. Each photo is divided into 40 sub-images of 864x864 for training. Modify slightly Yolov3 architecture to adapt to their problem : less anchors, different shape of anchor boxes, only one scale (instead of 3, object are always at the same distance), change loss function to reduce false positives. Give very high accuracy (similar to expert labels) and speed.

Gray et al, MEE 2018 Counting sea turtles on drone images. Relatively small dataset : 467 photos, but each one is divided in 2800 100x100 sub-image for the CNN input. CNN of modest size : 4 convolutions + 2 dense layers. The model finds 9% more turtle than there really are.

Guirado et al, Sci.Rep 2019 Whale counting in satellite and aerial images. GoogleNet Inception v3 + Faster R-CNN, a two-step CNN-based approach capable of counting whales in vast areas with a reduced computational cost, where the first CNN is used to filter out water potential false positives (ships,foam and rocks) but keeping candidate images to be analyzed later by the second and much slower CNN.

Masteling et al, Plant Methods 2020 Counting germinated sees under the microscope with YOLO.

Ditria et al, Front Marine Sci 2020 Fish Abundance Using Object Detection with Mask-RCNN. Using videos from submerged action cameras.

Bowler et al, Remote Sensing 2020 Count Albatrosses from Space with a 31 cm resolution sensors (satellite images 500 × 500, four multispectral bands -- red, green, blue and near-infrared). Using U-Net architecture, which was originally designed for biomedical image segmentation. U-Net works by classifying every pixel in the image into a class (here albatross and non-albatross).

Gökhan Akçay et al, Animals 2020 Counting birds (of 38 species, without differentiating) with Fast-RCNN from photographs. Full resolution, but dividing images into slices of height 600 pixels.

Hong et al, Sensors 2019 Counting birds with Yolo (the fastest), Fast-RCNN (the most accurate)and RetinaNet from pictures taken with Unmanned Aerial Vehicle Imagery (drone). In various environments. Full resulution but dividing into sub-images of 600 × 600 pixels.

Borowicz et al, Sci Rep 2018 Counting penguins from unmanned aerial vehicle imagery (drone) using NVidia DetectNet. Using 512 × 512 sub-images. The manually-labelled training data constituted 0.18% of the imaged area and 0.34% of the imaged penguins providing a massive decrease in manual labour required.

Duporge et al, Remote Senseing Eco. Cons. 2020 Counting elephants in high-resolution satellite images in heterogeneous landscapes. 1125 elephants were identified in the training image dataset using LabelImg. Using TensorFlow Object Detection API . Images were sliced into 600x600 pixel sub-images with 50 pixels overlap.

Pose, shape, behavior, tracking

Mathis et al, Nature Neuroscience 2018 Pose estimation of user-defined body parts with deep learning. A small number of training images (~200) can be sufficient to train this network. ResNet-50 + deconvolution layers that produce probability maps (probability density represents the evidence that a body part is in a particular location). In Tensorflow. And also DeepLabCut Model Zoo

Zuffi et al, ICCV 2019 Neural network to predict 3D pose, shape and texture of zebras.

Graving et al, eLife 2020 DeepPoseKit, a software toolkit for animal pose estimation. A CNN to automatically estimate the locations of an animal’s body parts directly from images

Pereira et al, Nature Meth. 2019 A fully convolutional architecture that learns a mapping from raw images to a set of confidence maps, interpreted as the 2D probability distribution (that is, a heat map) centered at the spatial coordinates of each body part within the image.

Roberts et al, Preprint 2020 Animal Behavior Prediction with Long Short-Term Memory, using RNN (recurent neural networks) in Keras.

Bozel et al, Nature Comm. 2021 A CNN segmentation architecture to automatically identify bees, using TensorFlow. Tracking individuals over time.

Dunn et al, Nature Meth. 2021 Pose detection in rats and mice. Exploiting mathematical relationships between camera positions to build a 3D feature space. Training a 3D U-net, implemented in Keras, using ground-truth 3D labels.

Hahn-Klimroth, Ecol Evol 2021 Pose estimation for African ungulates, using Mask R‐CNN with ResNet‐101 for object detection + action (standing, lying—head up, and lying—head down) classification using EfficientNet B3

Model interpretation / Inference of visual patterns

Miao et al, Scientific reports Use several methods to understand and interpret the weights after training. The model is VGG16 (and ResNet50), trained on 111 000 camera trap images with 20 species of Gorongosa National Park. They use GG-CAM, GBP, Grad-CAM to extract localized visual features of single images, Mutual Information (MI) to generalize within-species features and hierarchical clustering to inspect the visual similarities between species.

Wu et al, Nature Com 2019 Deep learning model generates a 2048-dimension feature vector that predicts each moth species’ elevation based on colour and shape features. Discriminative visual features with CAM + saliency maps were obtained by computing the gradient of outputs with respect to input images in order to highlight input regions that cause the most change in the outputs.


Stowell et al, MEE 2019 Binary classification for presence or absence of birds. Dividing into 10s clips. Using spectrogram + CNN.

Sethi et al, PNAS 2020 VGGish was trained by Google to perform general-purpose audio classification using YouTube-8M data. Once trained, the final layer was removed from the network, leaving a 128-dimensional acoustic feature embedding as the CNN output. Representing sound with UMAP.

Mac Aodha et al,PLoS Comp Biol Passive acoustic sensing for bats. Localizing echolocation calls (bat species emit ultrasonic pulses). Using log spectrogram + home-made CNN, with a sliding window of 23 ms.

Shyam Madhusudhana et al, J.R.Interface 2021CNNs designed to detect song notes (calls) in short-duration audio segments, for fin whale (Balaenoptera physalus) vocalizations. DenseNet with use of a custom pre-conditioning layer in place of the first convolution layer, using the frequency axis of the input spectrogram.

Bravo-Sanchez et al, Sci. Rep. 2021 Open-source CNN a­rchitecture27 (SincNet), designed to process raw human-speech samples, to identify species from raw digital waveforms sourced from a publicly-released and richly annotated birdsong ­dataset15 (NIPS4Bplus)

Rumelt et al, Ecol. Evol. 2021 Acoustic monitoring of birds was conducted using ten SWIFT ARUs. CNN classifier operating on the log-Mel-weighted spectrogram at a resolution of 512 × 512 px.

On board systems / Embedded Neural Networks

Klemens et al, MEE 2021 Camera trap system with motion-­detection trigger, using Raperry Pi. Images written to disc when a percentage of pixels differ between one image and the next.

Dominguez-Morales, Sensors 2021 Embedding a neural network (no convolution) in a microcontroller that collects data from sensors (accelerometer, gyroscope, and magnetometer) to detect three different horse gaits. Models obtained with Keras (.h5) were then converted to TensorFlow Lite (.tflite) models in order to integrate them in the different microcontrollers and hardware platforms

Cunha et al, arxiv 2021 Filtering empty image in embedded systems. Detectors (EfficientDet) achieve superior performance, eliminating at least 10% more empty images than classifiers (EfficientNet) with comparable latencies

results matching ""

    No results matching ""