Open Science Preconditions: Identifiability

VO Sharing is daring: Open Science approaches to Digital Humanities

Please read the lesson script below and complete the tasks.

Questions, remarks, issues? Participate in the Zoom meeting on Mon, 23.03.2020, 5 p.m. - 6 p.m.!

Mon, 23.03., 16:45 - 18:15: Open Science Preconditions: Identifiability

Before we move into this week's topic, let's briefly follow up on the sessions we've had so far. You discovered a number of new concepts as well as the importance of acronyms for the digital humanities in general and Open Science in particular. If you feel a little overwhelmed and "lost in translation", I recommend to take a look at the article "Do you speak open science? Resources and tips to learn the language".

With this, we have finished the first section of our course, which was the Introduction to Open Science. Let's move on to the second section, which will explain all the preconditions that have to be fulfilled in order for Open Science to even become possible. In this section, we will discuss many aspects of the legal framework of science and research in general, and Open Science especially. But before we can get to this, we have to focus on the most basic of preconditions, which is this week's topic: Identifiability. What do we mean by this term, and why is it important? We can get a clearer picture with an example:

Have you ever met someone named Maria Gruber? You may have really liked this person and tried to get in touch with them by looking them up on Facebook or Twitter - but no chance, there are simply too many Maria Grubers in Vienna, and to make matters worse, half of them chose a photo of their cat as profile picture. So how are you ever going to find them?

This problem is the precise reason why we need unique identifiers that we can attach to entities so that we can distinguish them from other entities. If there are 20 Viennese Maria Grubers on the social network of your choice, we will therefore identify them as Maria_Gruber1, Maria_Gruber2, Maria_Gruber3 and so on. However, this might not entirely solve our problem: The Maria Gruber we met might be a person who will at some point in their life decide to change their gender identity - and with it their name. Our Maria Gruber could thus turn into Markus Gruber, and with this adjustment, their identifier would also change: Maria_Gruber3 might turn into Markus_Gruber21. Everyone who had befriended Maria_Gruber3 would suddenly miss this person in their list of friends and would no longer be able to find them.

This explains why uniqueness is not the only important criterion for an identifier: it also has to be persistent. Therefore, our example social network will likely not identify all its Maria Grubers as Maria_Gruber1, Maria_Gruber2, Maria_Gruber3 and so on, but rather as user0001, user0820, user0866, user6904 and so on. Therefore, if the data entered in the fields "name" or "surname" of the account of user0866 are changed, the befriended accounts will remain connected to it - because the ID "user0866" persists even if Maria becomes Markus.

In the field of research and science, we face the very same issue. Here, it is usually the funders who want to correctly identify the funding applicants to determine the quality of their work and scientific profile in order to make a decision about whether or not to provide them with the financial resources they request. For this reason, a "social network" providing persistent identifiers (PIDs) for researchers was established: ORCiD (Open Researcher and Contributor ID). This is ORCiDs mission:

"ORCID provides a persistent digital identifier (an ORCID iD) that you own and control, and that distinguishes you from every other researcher. You can connect your iD with your professional information — affiliations, grants, publications, peer review, and more. You can use your iD to share your information with other systems, ensuring you get recognition for all your contributions, saving you time and hassle, and reducing the risk of errors." (ORCiD: Home)

Task 1

Please visit the ORCiD website and create an account. Be aware that an ORCiD is like a matriculation number: You will have it for the rest of your life (this is the very meaning of "persistence"), so DO NOT LOSE YOUR ID AND DO NOT FORGET YOUR PASSWORD.

Now that we have identified the PID for persons in the research environment, we also need to identify other identifiers for other types of entities. If we are, say, a funding body who, thanks to ORCiD, can persistently identify a unique person as researcher Maria Gruber, that doesn't help us much to decide whether to fund their research if we don't know what it is that Maria Gruber has already done in their career. Therefore, Maria will provide a list of papers they have written on their ORCiD profile page, including links to these papers. Now let's assume that Maria worked at the University of Graz in a project and published two papers during this time, which they uploaded to the repository of University of Graz; on their ORCiD, they included this link. But then Maria was offered a permanent position at the University of Salzburg, and when they switched jobs, they took their publications with them by removing all publications from the Graz repository and uploading them anew to the repository of the University of Salzburg. The files, i.e. the digital objects, are the same, but the URLs for finding them have changed. In order to keep these digital objects uniquely and persistently identifiable, the Digital Object Identifier system (DOI) was established.

"The DOI system provides a technical and social infrastructure for the registration and use of persistent interoperable identifiers, called DOIs, for use on digital networks." (DOI: Home)

Now that we have gained a basic understanding of the two most important PID systems in the context of Open Science that are out there, we can dive in a little deeper.

Task 2

Read the paper "Connecting the Persistent Identifier Ecosystem: Building the Technical and Human Infrastructure for Open Research" by Angela Dappert, Adam Farquhar, Rachael Kotarski, and Kirstie Hewlett to understand how ORCiD and DOI interact with each other and what other PIDs are out there.

Task 3

Find out the DOI of the paper you read in task 2 and the ORCiDs of its authors. Can you find out other identifiers and/or PIDs that are connected to the paper?

We have now understood why PIDs are important for research as such and the general Open Science landscape. To contextualize PIDs with the research infrastructure world we encountered in our last session, we will now learn why infrastructures rely on the PID system.

Task 4

Please visit the PARTHENOS training module "Introduction to Research Infrastructures", which you already got to know and love last week. To successfully complete this week's session of the course, please read the (short) section "Persistent identifiers".

We have now learned that DOI is not only any PID system, but also a standard in the field of PIDs. Standards play a vital role in the field of both Digital Humanities and of Open Science. Standards make things identifiable because they ensure that the same "language" is used to talk about these things. And thus, they enable interoperability.

Task 5

The final task in this session is to understand why standards are so important to guarantee identifiability. To do so, visit the PARTHENOS training module "Introduction to Research Infrastructures" and read and watch all content in the section"What are standards?". Subsequently, explore the "Standardization Survival Kit" (SSK): Can you find out what standards historians use for archiving?

Reading

Masuzzo P, Martens L. 2017. Do you speak open science? Resources and tips to learn the language. PeerJ Preprints 5:e2689v1. https://doi.org/10.7287/peerj.preprints.2689v1
Dappert, A., Farquhar, A., Kotarski, R., & Hewlett, K. (2017). Connecting the Persistent Identifier Ecosystem: Building the Technical and Human Infrastructure for Open Research. Data Science Journal, 16, 28. https://doi.org/10.5334/dsj-2017-028
Gasparyan, A. Y., Yessirkepov, M., Gerasimov, A. N., Kostyukova, E. I., & Kitas, G. D. (2016). Scientific author names: errors, corrections, and identity profiles. Biochemia Medica, 169–173. https://doi.org/10.11613/BM.2016.017
Illmayer, K. (2019) „Openness in Forschungsprojekten: PARTHENOS Standardization Survival Kit (SSK)“, Mitteilungen der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare, 72(2), S. 392-407. doi: 10.31263/voebm.v72i2.3221.
Marín-Arraiza, P. (2019) „ORCID im Open Science-Szenario: Chancen für wissenschaftliche Bibliotheken“, Mitteilungen der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare, 72(2), S. 478-493. doi: 10.31263/voebm.v72i2.2811.