TY - JOUR
T1 - Endless Forams
T2 - >34,000 modern planktonic foraminiferal images for taxonomic training and automated species recognition using convolutional neural networks
AU - Hsiang, Allison
AU - Brombacher, Anieke
AU - Rillo, Marina
AU - Mleneck-Vautravers, Maryline
AU - Conn, Stephen
AU - Lordsmith, Sian
AU - Jentzen, Anna
AU - Henehan, Michael
AU - Metcalfe, Brett
AU - Fenton, Isabel
AU - Wade, Bridget
AU - Fox, Lyndsey
AU - Meilland, Julie
AU - Davis, Catherine
AU - Baranowski, Ulrike
AU - Groeneveld, Jeroen
AU - Edgar, Kirsty
AU - Movellan, Aurore
AU - Aze, Tracy
AU - Dowsett, Harry
AU - Miller, Giles
AU - Rios, Nelson
AU - Hull, Pincelli
PY - 2019/7
Y1 - 2019/7
N2 - Planktonic foraminiferal species identification is central to many paleoceanographic studies, from selecting species for geochemical research to elucidating the biotic dynamics of microfossil communities relevant to physical oceanographic processes and interconnected phenomena such as climate change. However, few resources exist to train students in the difficult task of discerning amongst closely related species, resulting in diverging taxonomic schools that differ in species concepts and boundaries. This problem is exacerbated by the limited number of taxonomic experts. Here we document our initial progress toward removing these confounding and/or rate-limiting factors by generating the first extensive image library of modern planktonic foraminifera, providing digital taxonomic training tools and resources, and automating species-level taxonomic identification of planktonic foraminifera via machine learning using convolution neural networks. Experts identified 34,640 images of modern (extant) planktonic foraminifera to the species level. These images are served as species exemplars through the online portal Endless Forams (endlessforams.org) and a taxonomic training portal hosted on the citizen science platform Zooniverse (zooniverse.org/projects/ahsiang/endless-forams/). A supervised machine learning classifier was then trained with ~27,000 images of these identified planktonic foraminifera. The best-performing model provided the correct species name for an image in the validation set 87.4% of the time and included the correct name in its top three guesses 97.7% of the time. Together, these resources provide a rigorous set of training tools in modern planktonic foraminiferal taxonomy and a means of rapidly generating assemblage data via machine learning in future studies for applications such as paleotemperature reconstruction.
AB - Planktonic foraminiferal species identification is central to many paleoceanographic studies, from selecting species for geochemical research to elucidating the biotic dynamics of microfossil communities relevant to physical oceanographic processes and interconnected phenomena such as climate change. However, few resources exist to train students in the difficult task of discerning amongst closely related species, resulting in diverging taxonomic schools that differ in species concepts and boundaries. This problem is exacerbated by the limited number of taxonomic experts. Here we document our initial progress toward removing these confounding and/or rate-limiting factors by generating the first extensive image library of modern planktonic foraminifera, providing digital taxonomic training tools and resources, and automating species-level taxonomic identification of planktonic foraminifera via machine learning using convolution neural networks. Experts identified 34,640 images of modern (extant) planktonic foraminifera to the species level. These images are served as species exemplars through the online portal Endless Forams (endlessforams.org) and a taxonomic training portal hosted on the citizen science platform Zooniverse (zooniverse.org/projects/ahsiang/endless-forams/). A supervised machine learning classifier was then trained with ~27,000 images of these identified planktonic foraminifera. The best-performing model provided the correct species name for an image in the validation set 87.4% of the time and included the correct name in its top three guesses 97.7% of the time. Together, these resources provide a rigorous set of training tools in modern planktonic foraminiferal taxonomy and a means of rapidly generating assemblage data via machine learning in future studies for applications such as paleotemperature reconstruction.
KW - planktonic foraminifera
KW - global community macroecology
KW - supervised machine learning
KW - convolutional neural networks
KW - marine microfossils
KW - species identification
UR - http://www.scopus.com/inward/record.url?scp=85069895900&partnerID=8YFLogxK
U2 - 10.1029/2019PA003612
DO - 10.1029/2019PA003612
M3 - Article
SN - 2572-4517
VL - 34
SP - 1157
EP - 1177
JO - Paleoceanography and Paleoclimatology
JF - Paleoceanography and Paleoclimatology
IS - 7
ER -