Exploring Related Records in the Flowering Plant Genus Senegalia in Brazil

Joe Miller

2023-01-18

GBIF

In 2020 GBIF released a news item “New data-clustering feature aims to improve data quality and reveal cross-dataset connections.” Basically, we run an algorithm across datasets shared with GBIF to search for similarities in occurrences data fields such as location, identifiers and dates. Please read this blog by Marie Grosjean and Tim Robertson for more details on how it works. In general, we can identify linkages between specimens, DNA sequences and literature citations.

Checklist publishing on GBIF - some explanations on taxonID, scientificNameID, taxonConceptID, acceptedNameUsageID, nameAccordingTo

Marie Grosjean

2022-12-08

GBIF data publishing

When data publishers publish checklists, they will use a a Darwin Core Archive Taxon Core. And although the taxon core terms are already described here, what exactly to put in which field can sometimes be confusing. And there is a lot to read, like here https://github.com/tdwg/tnc/issues/1 Here I am sharing a summary of an email conversation we had with some data publishers on Helpdesk concerning some of the Taxon Core fields.

Which data can be shared through GBIF and what cannot

Cecilie Svenningsen

2022-11-17

GBIF

Preparing a dataset to be shared on GBIF.org can be a daunting task and many publishers realize that not all their data fits in the Darwin Core standard (DwC) and extensions GBIF uses to structure, standardize and display biodiversity data. This blog post will cover what data fits in GBIF, give examples of data that does not fit in the current format of GBIF, and provide guidance to how you can share relevant data in a metadata-only dataset or through a third-party.

Finding data gaps in the GBIF backbone taxonomy

John Waller

2022-04-25

GBIF

When publishers supply GBIF with a scientific name, this name is sometimes not found in the GBIF taxonomic backbone. In these cases, the occurrence record gets a data quality flag called taxon match higher rank. This means that GBIF was only able to match the name to a higher rank (genus, family, order …).

The World Checklist of Vascular Plants (Fabaceae)

John Waller

2022-03-28

GBIF

The World Checklist of Vascular Plants (WCVP): Fabaceae is a new GBIF mediated checklist that drastically increases the coverage of the family Fabaceae in the GBIF backbone.

Using Apache Arrow and Parquet with GBIF-mediated occurrences

John Waller and Carl Boettiger

2022-02-18

GBIF

As written about in a previous blog post, GBIF now has database snapshots of occurrence records on AWS. This allows users to access large tables of GBIF-mediated occurrence records from Amazon s3 remote storage. This access is free of charge.

Identifying potentially related records - How does the GBIF data-clustering feature work?

Marie Grosjean and Tim Robertson

2021-11-04

GBIF data use

Many data users may suspect they’ve discovered duplicated records in the GBIF index. You download data from GBIF, analyze them and realize that some records have the same date, scientific name, catalogue number and location but come from two different publishers or have slightly different attributes. There are many valid reasons why these duplicates appear on GBIF. Sometimes an observation was recorded in two different systems, sometimes several records correspond to herbaria duplicates (you can check the work of Nicky Nicolson on the topic), sometimes a specimen was digitized twice, sometimes a record has been enriched with genetic information and republished via a different platform…

What are the flags "Collection match fuzzy", "Collection match none", "Institution match fuzzy", "Institution match none" and how to remove them?

Marie Grosjean

2021-10-11

GBIF Publishing

You are a data publisher of occurrence data through GBIF.org, care about your data quality, and wonder what to do about the issue flags that show up on your occurrences. You might have noticed a new type flag this year relating to collection and institution codes and identifiers. They are the result of our attempt at linking specimens records to our Registry of Scientific Collections (GRSciColl).

GBIF API beginners guide

John Waller

2021-09-07

GBIF

This a GBIF API beginners guide.

The GBIF API technical documentation might be a bit confusing if you have never used an API before. The goal of this guide is to introduce the GBIF API to a semi-technical user who may have never used an API before.

The purpose of the GBIF API is to give users access to GBIF databases in a safe way. The GBIF API also allows GBIF.org and rgbif to function.

Did you know that...? - some of the lesser known functionalities around GBIF.org

Andrea Hahn

2021-08-05

GBIF

During the first-ever virtual GBIF 2021 Global Nodes Meeting, GBIFS hosted a “game show”: a one-hour “battle of Nodes vs. helpdesk”. The not-so-hidden goal was to demonstrate some of the lesser known functionalities of GBIF.org through a fun, interactive session.

GBIF and Apache-Spark on AWS tutorial

John Waller

2021-06-04

GBIF

GBIF now has a snapshot of 1.3 billion occurrence_✝ records on Amazon Web Services (AWS). This guide will take you through running Spark notebooks on AWS. The GBIF snapshot is documented : here.

June snapshot of https://t.co/CJaPsifdp0 occurrence data now available on the Amazon and Microsoft clouds, based on https://t.co/aGbvTisapJ. See https://t.co/lRXM2uqFh0 for more details.
— GBIF (@GBIF) June 2, 2021

You can read previous discussions about GBIF and cloud computing here. The main reason you would want to use cloud computing is to run big data queries that are slow or impractical on a local machine.

Derived datasets

Daniel Noesgaard

2021-05-20

GBIF Citation

You’ve finished an analysis using GBIF-mediated data, you’re writing up your manuscript and checking all the references, but you’re unsure of how to cite GBIF. If you Google it, you’ll probably end up reading our citation guideslines and quickly realize that GBIF is all about DOIs. Datasets have their own DOIs and downloads of aggregated data also have their own DOIs.

But maybe you didn’t download data through the GBIF.org portal. Maybe you relied on an R package like rgbif or dismo that retrived data synchronously from the GBIF API? Maybe a grad student downloaded if for you? Maybe you accessed and analyzed the data using a cloud computing service, like Microsoft Azure or Amazon Web Services? In any case, which DOI do you cite if you don’t have one?

GBIF and Apache-Spark on Microsoft Azure tutorial

John Waller

2021-05-19

GBIF

GBIF now has a snapshot of 1.3 billion occurrences_✝ records on Microsoft Azure.

It is hosted by the Microsoft AI for Earth program, which hosts geospatial datasets that are important to environmental sustainability and Earth science. Hosting is convenient because you could now use occurrences in combination with other environmental layers and not need to upload any of it to the Azure. You can read previous discussions about GBIF and cloud computing here. The main reason you would want to use cloud computing is to run big data queries that are slow or impractical on a local machine.

The GBIF Registry of Scientific Collections (GRSciColl) in 2021

Marie Grosjean

2021-03-12

GBIF GRSciColl registry

The GBIF Registry of Scientific Collections, also known as GRSciColl, has been available on GBIF.org since 2019 but it recently got some more attention when we connected it to GBIF occurrences. Now is the perfect time to share a bit of GRSciColl history and what we plan for its future. A brief history of GRSciColl First of all, here are a few facts about GrSciColl today, at the start of 2021:

Common things to look out for when post-processing GBIF downloads

John Waller

2021-02-17

GBIF

Post was updated on April 20 2022 to accommodate changes to dwc:establishmentMeans vocabulary handling.

Here I present a checklist for filtering GBIF downloads.

In this guide, I will assume you are familar with R. This guide is also somewhat general, so your solution might differ. This guide is intended to give you a checklist of common things to look out for when post-processing GBIF downloads.

(Almost) everything you want to know about the GBIF Species API

Marie Grosjean

2020-11-17

GBIF api

Today, we are talking about the GBIF Species API. Although you might not use it directly, you probably encountered it while using the GBIF web portal:

Typing a scientific name in the GBIF Occurrence search.
Seeing a “Taxon Match Fuzzy” flag.
Using the Species Name matching tool.

This API is what allow us to navigate through the names available on GBIF. I will try to avoid repeating what you can already find in its documentation. Instead, I will attempt to give an overview and answer some questions that we received in the past.

Feb	MAR	Apr
	07
2022	2023	2024