Sampling strategy and responses
The methodology for distributing and analysing researcher survey data, sourced from the 2023 ORCID database, involved a multi-step process. A comprehensive data extraction was conducted, retrieving relevant details such as researcher IDs, given names, family names, countries of residence, and email addresses. Records lacking publicly available information were excluded from the dataset to ensure adherence to privacy norms and regulations. This filtering process yielded 333,105 email addresses corresponding to 213,511 unique researcher IDs.
Recognising the importance of capturing a globally representative sample of researchers' opinions, the country data was utilised to map the number of researchers per region, as delineated in Table 1.
Table 1. Original distribution of ORCID IDs by region.
Continent | Region | ORCID IDs |
---|---|---|
Africa | Northern Africa | 4,753 |
| Sub-Saharan Africa | 4,666 |
Americas | Latin America and the Caribbean | 41,979 |
| Northern America | 22,162 |
Asia | Central Asia | 1,030 |
| South-eastern Asia | 10,325 |
| Southern Asia | 22,318 |
| Eastern Asia | 26,942 |
| Western Asia | 13,307 |
Europe | Northern Europe | 15,935 |
| Southern Europe | 32,115 |
| Western Europe | 21,346 |
| Eastern Europe | 18,505 |
Oceania | Australia and New Zealand | 5,932 |
| Micronesia | 20 |
| Melanesia | 80 |
| Polynesia | 36 |
Upon analysis, a noticeable geographical imbalance in the distribution of researchers across continents was evident based on the regional classifications employed in this study. Three key adjustments were proposed to address this disparity. The first involved the consolidation of all regions within Oceania into a single entity due to the minimal number of records in some areas, which did not warrant separate categorisation. The same was performed for Central Asia, who was incorporated with Eastern Asia. The third adjustment pertained to the Latin America and Caribbean region, where the number of researcher IDs disproportionately represented the region's geographic and scientific system diversity. Consequently, a division was proposed along the intermediate regional lines, effectively segregating Central America and the Caribbean from South America. This subdivision balanced the number of records detailed in Table 2.
Table 2. Adjusted distribution of ORCID IDs by region.
Continent | Region | ORCID IDs |
---|---|---|
Africa | Northern Africa | 4,753 |
| Sub-Saharan Africa | 4,666 |
Americas | Central America and the Caribbean | 7,847 |
| Northern America | 22,162 |
| South America | 34,791 |
Asia | South-eastern Asia | 10,325 |
| Southern Asia | 22,318 |
| Eastern and Central Asia | 27,972 |
| Western Asia | 13,307 |
Europe | Northern Europe | 15,935 |
| Southern Europe | 32,115 |
| Western Europe | 21,346 |
| Eastern Europe | 18,505 |
Oceania | Oceania | 6,013 |
Despite these adjustments, regional disparities in the number of researchers persisted. To mitigate this, a calculated approximation of the necessary sample sizes from each region was determined to optimally represent their respective research communities. This calculation was informed by data from the UNESCO Institute for Statistics (concerning researchers in R&D per million people, in FTE) and the United Nations Statistics Division (Standard Country for Statistical Use). Utilising the most recent data available for each country, the sample size for each group was calculated based on the Cochran formula, commonly employed for determining sample sizes in surveys and experiments. The calculation was predicated on a 95% confidence level and a presumed population proportion of 0.5, aiming to maximize sample size within a 5% margin of error.
Table 3 shows the number of available ORCID IDs in the database and Researchers per region, adding the results of the sampling process in three calculations. The first is the number of respondents needed to represent that research community. The second is the number of e-mails sent and the last one is the response rate needed to reach the desired number of responses.
Table 3. Sample size and response rate calculations.
Last updated