# Load libraries
library(bibliometrix)
library(tidyverse)
library(ggplot2)
library(maps)

# Import raw Scopus data
scopus_raw <- convert2df(
  file = "_small lake__.csv [no keyword limitation]",
  dbsource = "scopus",
  format = "csv"
)
## 
## Converting your scopus collection into a bibliographic dataframe
## 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!
# Run bibliometric analysis
bib_analysis <- biblioAnalysis(scopus_raw, sep = ";")
bib_summary <- summary(bib_analysis, k = 10, pause = FALSE)
## 
## 
## MAIN INFORMATION ABOUT DATA
## 
##  Timespan                              1898 : 2025 
##  Sources (Journals, Books, etc)        2236 
##  Documents                             13181 
##  Annual Growth Rate %                  0 
##  Document Average Age                  15.5 
##  Average citations per doc             34.62 
##  Average citations per year per doc    2.167 
##  References                            715186 
##  
## DOCUMENT TYPES                     
##  article               11637 
##  book                  117 
##  book chapter          356 
##  conference paper      370 
##  data paper            16 
##  editorial             21 
##  erratum               4 
##  letter                20 
##  note                  33 
##  review                601 
##  short survey          6 
##  
## DOCUMENT CONTENTS
##  Keywords Plus (ID)                    31456 
##  Author's Keywords (DE)                23153 
##  
## AUTHORS
##  Authors                               31119 
##  Author Appearances                    56688 
##  Authors of single-authored docs       1228 
##  
## AUTHORS COLLABORATION
##  Single-authored docs                  1558 
##  Documents per Author                  0.424 
##  Co-Authors per Doc                    4.3 
##  International co-authorships %        25.9 
##  
## 
## Annual Scientific Production
## 
##  Year    Articles
##     1898        1
##     1926        1
##     1943        1
##     1947        1
##     1948        2
##     1956        1
##     1961        2
##     1965        1
##     1967        1
##     1970        5
##     1971        7
##     1972        6
##     1973       17
##     1974       11
##     1975       13
##     1976       19
##     1977       21
##     1978       32
##     1979       37
##     1980       40
##     1981       37
##     1982       50
##     1983       56
##     1984       58
##     1985       48
##     1986       93
##     1987       79
##     1988      106
##     1989       89
##     1990      117
##     1991       83
##     1992      119
##     1993       99
##     1994      107
##     1995      103
##     1996      170
##     1997      187
##     1998      206
##     1999      206
##     2000      208
##     2001      201
##     2002      220
##     2003      275
##     2004      225
##     2005      285
##     2006      302
##     2007      271
##     2008      285
##     2009      345
##     2010      398
##     2011      363
##     2012      459
##     2013      463
##     2014      495
##     2015      532
##     2016      595
##     2017      579
##     2018      601
##     2019      603
##     2020      683
##     2021      727
##     2022      743
##     2023      700
##     2024      390
##     2025        1
## 
## Annual Percentage Growth Rate 0 
## 
## 
## Most Productive Authors
## 
##    Authors        Articles Authors        Articles Fractionalized
## 1       ZHANG Y        112   SMOL JP                         21.2
## 2       WANG Y          83   BRILLO BBC                      20.9
## 3       WANG J          79   BIRKS HJB                       20.1
## 4       LI Y            75   ZHANG Y                         19.5
## 5       LI J            71   ANDERSON NJ                     17.3
## 6       SMOL JP         68   WANG Y                          15.8
## 7       RASK M          62   RASK M                          14.8
## 8       WANG L          60   JR                              14.7
## 9       JR              57   MOISEENKO TI                    14.5
## 10      BIRKS HJB       56   SCHINDLER DW                    14.2
## 
## 
## Top manuscripts per citations
## 
##                                                                     Paper         
## 1  SCHWARZENBACH RP, 2005, ENVIRONMENTAL ORGANIC CHEMISTRY                        
## 2  HYSLOP EJ, 1980, J FISH BIOL                                                   
## 3  REYNOLDS CS, 2006, THE ECOLOGY OF PHYTOPLANKTON                                
## 4  KIDD KA, 2007, PROC NATL ACAD SCI U S A                                        
## 5  ALLAN JD, 2007, STREAM ECOL: STRUCT AND FUNCT OF RUNNING WATERS: SECOND EDITION
## 6  POFF NL, 1997, J NORTH AM BENTHOLOGICAL SOC                                    
## 7  CORRELL DL, 1998, J ENVIRON QUAL                                               
## 8  LIMA SL, 1998, BIOSCIENCE                                                      
## 9  ADRIAN R, 2009, LIMNOL OCEANOGR                                                
## 10 WERNER EE, 2003, ECOLOGY                                                       
##                                                DOI   TC TCperYear  NTC
## 1  10.1002/0471649643                              3743     170.1 64.5
## 2  10.1111/j.1095-8649.1980.tb02775.x              3684      78.4 28.9
## 3  10.1017/CBO9780511542145                        1944      92.6 33.0
## 4  10.1073/pnas.0609568104                         1644      82.2 28.9
## 5  10.1007-978-1-4020-5583-6                       1451      72.5 25.5
## 6  10.2307/1468026                                 1441      48.0 22.5
## 7  10.2134/jeq1998.00472425002700020004x           1429      49.3 18.1
## 8  10.2307/1313225                                 1428      49.2 18.1
## 9  10.4319/lo.2009.54.6_part_2.2283                1324      73.6 28.8
## 10 10.1890/0012-9658(2003)084[1083:AROTII]2.0.CO;2 1319      55.0 20.6
## 
## 
## Corresponding Author's Countries
## 
##           Country Articles   Freq  SCP MCP MCP_Ratio
## 1  USA                1872 0.2058 1452 420     0.224
## 2  CHINA              1142 0.1255  804 338     0.296
## 3  CANADA              935 0.1028  695 240     0.257
## 4  UNITED KINGDOM      464 0.0510  281 183     0.394
## 5  GERMANY             419 0.0461  229 190     0.453
## 6  FINLAND             338 0.0372  268  70     0.207
## 7  SWEDEN              314 0.0345  168 146     0.465
## 8  POLAND              272 0.0299  227  45     0.165
## 9  FRANCE              249 0.0274  124 125     0.502
## 10 JAPAN               226 0.0248  186  40     0.177
## 
## 
## SCP: Single Country Publications
## 
## MCP: Multiple Country Publications
## 
## 
## Total Citations per Country
## 
##            Country      Total Citations Average Article Citations
## 1  USA                            85290                     45.56
## 2  CANADA                         34587                     36.99
## 3  UNITED KINGDOM                 25085                     54.06
## 4  CHINA                          22503                     19.70
## 5  GERMANY                        14531                     34.68
## 6  SWEDEN                         14507                     46.20
## 7  FINLAND                        11441                     33.85
## 8  AUSTRALIA                      10022                     50.11
## 9  FRANCE                          9505                     38.17
## 10 SWITZERLAND                     7209                     49.72
## 
## 
## Most Relevant Sources
## 
##                                        Sources        Articles
## 1  HYDROBIOLOGIA                                           573
## 2  FRESHWATER BIOLOGY                                      295
## 3  JOURNAL OF PALEOLIMNOLOGY                               280
## 4  SCIENCE OF THE TOTAL ENVIRONMENT                        260
## 5  LIMNOLOGY AND OCEANOGRAPHY                              219
## 6  QUATERNARY SCIENCE REVIEWS                              184
## 7  WATER (SWITZERLAND)                                     180
## 8  HOLOCENE                                                178
## 9  CANADIAN JOURNAL OF FISHERIES AND AQUATIC SCIENCES      174
## 10 JOURNAL OF HYDROLOGY                                    137
## 
## 
## Most Relevant Keywords
## 
##    Author Keywords (DE)      Articles   Keywords-Plus (ID)     Articles
## 1             CLIMATE CHANGE      408 LAKES                        2237
## 2             LAKES               377 ARTICLE                      1351
## 3             EUTROPHICATION      356 CLIMATE CHANGE               1274
## 4             HOLOCENE            330 WATER QUALITY                1245
## 5             PHYTOPLANKTON       294 LAKE                         1226
## 6             ZOOPLANKTON         280 UNITED STATES                1021
## 7             DIATOMS             267 ENVIRONMENTAL MONITORING     1012
## 8             WATER QUALITY       218 CHINA                         975
## 9             LAKE SEDIMENTS      204 EUTROPHICATION                907
## 10            LAKE                196 ECOSYSTEM                     888
p1 <- plot(bib_analysis, k = 10, pause = FALSE)

1. Introduction

Bibliometric analysis is a quantitative method used to evaluate and map the scientific literature of a specific field. This study uses bibliometric analysis to examine the research landscape on small lakes from 1990 to 2024, using data retrieved from the Scopus database.

The analysis aims to identify publication trends, leading authors, influential journals, productive countries, and dominant research themes in the field of small lakes research.

2. Data and Methods

Data was retrieved from the Scopus database using “small lake*” and related terms as search keywords. The raw dataset consisted of 13,181 records spanning from 1898 to 2025.

The dataset was cleaned using the following steps:

  • Filtered to include only publications from 1990 to 2024
  • Retained only Articles and Reviews as document types
  • Removed 72 duplicate records based on title

The final cleaned dataset comprises 11,391 documents published across 1,729 sources by 28,700 authors.

All analyses and visualizations were performed in R using the bibliometrix, tidyverse, and ggplot2 packages.

3. Data Cleaning

The raw dataset retrieved from Scopus consisted of 13,181 records spanning from 1898 to 2025. Before proceeding with the analysis, the data was inspected and cleaned through the following steps.

3.1 Checking for Missing Values

The first step was to check whether key bibliometric fields contained any missing values.

colSums(is.na(scopus_raw[, c("AU", "TI", "PY", "SO", "DE", "ID", "AB", "TC", "CR", "C1")]))
## AU TI PY SO DE ID AB TC CR C1 
##  0  0  0  0  0  0  0  0  0  0

All key fields returned zero missing values, indicating that the Scopus export was complete and no imputation was necessary.

3.2 Checking Publication Years

Next, the distribution of publication years was examined to identify any anomalous or sparse records.

table(scopus_raw$PY)
## 
## 1898 1926 1943 1947 1948 1956 1961 1965 1967 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 
##    1    1    1    1    2    1    2    1    1    5    7    6   17   11   13   19   21   32   37   40   37   50   56   58 
## 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 
##   48   93   79  106   89  117   83  119   99  107  103  170  187  206  206  208  201  220  275  225  285  302  271  285 
## 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 
##  345  398  363  459  463  495  532  595  579  601  603  683  727  743  700  390    1

The data contained records as far back as 1898, with publication counts remaining very sparse prior to 1990. To ensure a meaningful and consistent trend analysis, the dataset was filtered to include only publications from 1990 onwards. Additionally, the single record from 2025 was excluded as the year is incomplete and would skew the publication trend downward.

3.3 Checking for Duplicates

sum(duplicated(scopus_raw$TI))
## [1] 72

A total of 72 duplicate records were identified based on document title and were removed to avoid inflating author and citation counts.

3.4 Checking Document Types

table(scopus_raw$DT)
## 
##          ARTICLE             BOOK     BOOK CHAPTER CONFERENCE PAPER       DATA PAPER        EDITORIAL          ERRATUM 
##            11637              117              356              370               16               21                4 
##           LETTER             NOTE           REVIEW     SHORT SURVEY 
##               20               33              601                6

The dataset contained 11 document types, including articles, reviews, conference papers, book chapters, books, editorials, and others. For bibliometric analysis, only peer-reviewed articles and reviews were retained, as these are the most comparable and citable document types in the literature.

3.5 Applying the Cleaning Steps

Based on the inspection above, the following cleaning steps were applied:

  • Filter to publications from 1990 to 2024
  • Retain only Articles and Reviews
  • Remove 72 duplicate records based on title
# Filter by year
scopus_clean <- scopus_raw[scopus_raw$PY >= 1990 & scopus_raw$PY <= 2024, ]

# Keep only Articles and Reviews
scopus_clean <- scopus_clean[scopus_clean$DT %in% c("ARTICLE", "REVIEW"), ]

# Remove duplicates
scopus_clean <- scopus_clean[!duplicated(scopus_clean$TI), ]

# Confirm final dataset dimensions
dim(scopus_clean)
## [1] 11391    53

The cleaned dataset comprises 11,391 documents ready for bibliometric analysis.

4. Descriptive Overview

The following key metrics summarize the dataset:

Metric Value
Timespan 1990–2024
Total Documents 11,391
Total Authors 28,700
Total Sources 1,729
Annual Growth Rate 3.53%
Average Citations per Document 33.55
Co-Authors per Document 4.53
International Co-authorships 27.71%

5. Visualizations

5.1 Annual Scientific Production

p1$AnnualScientProd +
  labs(
    title = "Annual Scientific Production on Small Lakes (1990–2024)",
    subtitle = "Based on Scopus database | n = 11,391 documents",
    x = "Year",
    y = "Number of Articles",
    caption = "Source: Scopus | Bibliometric Analysis"
  )

5.2 Top 10 Most Productive Authors

p1$MostProdAuthors +
  labs(
    title = "Top 10 Most Productive Authors in Small Lakes Research",
    subtitle = "Based on Scopus database | n = 11,391 documents",
    x = "Number of Articles",
    y = "Authors",
    caption = "Source: Scopus | Bibliometric Analysis"
  )

5.3 Top 10 Most Relevant Journals

journal_df <- as.data.frame(table(scopus_clean$SO))
colnames(journal_df) <- c("Journal", "Articles")
journal_df <- journal_df[order(-journal_df$Articles), ][1:10, ]

ggplot(journal_df, aes(x = reorder(Journal, Articles), y = Articles)) +
  geom_bar(stat = "identity", fill = "#2171b5") +
  coord_flip() +
  labs(
    title = "Top 10 Most Relevant Journals in Small Lakes Research",
    subtitle = "Based on Scopus database | n = 11,391 documents",
    x = "Journal",
    y = "Number of Articles",
    caption = "Source: Scopus | Bibliometric Analysis"
  )

5.4 Country Scientific Production

library(maps)

country_df <- as.data.frame(bib_analysis$Countries)
colnames(country_df) <- c("Country", "Articles")
country_df$Country <- stringr::str_to_title(country_df$Country)

world_map <- map_data("world")
country_df$Country[country_df$Country == "Usa"] <- "USA"
country_df$Country[country_df$Country == "United Kingdom"] <- "UK"

map_merged <- left_join(world_map, country_df, by = c("region" = "Country"))

ggplot(map_merged, aes(x = long, y = lat, group = group, fill = Articles)) +
  geom_polygon(color = "white", linewidth = 0.1) +
  scale_fill_gradient(
    low = "#c6dbef", high = "#08306b",
    na.value = "gray90",
    name = "Articles"
  ) +
  labs(
    title = "Country Scientific Production on Small Lakes Research",
    subtitle = "Based on Scopus database | n = 11,391 documents",
    caption = "Source: Scopus | Bibliometric Analysis"
  )

5.5 Top 20 Author Keywords

keywords <- scopus_clean %>%
  select(DE) %>%
  filter(!is.na(DE)) %>%
  mutate(DE = tolower(DE)) %>%
  separate_rows(DE, sep = ";") %>%
  mutate(DE = str_trim(DE)) %>%
  filter(DE != "")

top_keywords <- keywords %>%
  count(DE, sort = TRUE) %>%
  slice_head(n = 20)

ggplot(top_keywords, aes(x = reorder(DE, n), y = n)) +
  geom_bar(stat = "identity", fill = "#2171b5") +
  geom_text(aes(label = n), hjust = -0.2, size = 3, color = "gray30") +
  coord_flip() +
  labs(
    title = "Top 20 Author Keywords in Small Lakes Research",
    subtitle = "Based on Scopus database | n = 11,391 documents",
    x = "Keyword",
    y = "Frequency",
    caption = "Source: Scopus | Bibliometric Analysis"
  )

5.6 Thematic Map

thematic_map <- thematicMap(
  scopus_clean,
  field = "DE",
  n = 250,
  minfreq = 5,
  stemming = FALSE,
  size = 0.5,
  n.labels = 1,
  repel = TRUE
)

plot(thematic_map$map) +
  labs(
    title = "Thematic Map of Small Lakes Research",
    subtitle = "Based on Scopus database | n = 11,391 documents",
    caption = "Source: Scopus | Bibliometric Analysis"
  )

5.7 Top 10 Most Cited Papers

cited_df <- scopus_clean %>%
  select(TI, AU, PY, SO, TC) %>%
  arrange(desc(TC)) %>%
  slice_head(n = 10) %>%
  mutate(
    Label = paste0(word(AU, 1), " (", PY, ")"),
    TI = str_trunc(TI, 50)
  )

ggplot(cited_df, aes(x = reorder(Label, TC), y = TC)) +
  geom_bar(stat = "identity", fill = "#2171b5") +
  geom_text(aes(label = TC), hjust = -0.2, size = 3, color = "gray30") +
  coord_flip() +
  labs(
    title = "Top 10 Most Cited Papers in Small Lakes Research",
    subtitle = "Based on Scopus database | n = 11,391 documents",
    x = "Paper",
    y = "Total Citations",
    caption = "Source: Scopus | Bibliometric Analysis"
  )

6. Conclusion

This bibliometric analysis examined the scientific landscape of small lake research using 11,391 documents retrieved from the Scopus database, covering the period from 1990 to 2024. The analysis reveals a field that has grown consistently over three decades, with an annual growth rate of 3.53% and a peak publication output of 685 documents in 2022 — reflecting a growing recognition of the ecological importance of small lakes in the global scientific community.

The United States, China, and Canada emerged as the most productive countries, while Hydrobiologia and the Journal of Paleolimnology were identified as the leading publication outlets. The high international co-authorship rate of 27.71% and an average of 4.53 co-authors per document suggest that small lake research is increasingly collaborative and globally engaged.

Thematic analysis of author keywords revealed that the dominant research themes revolve around climate change, eutrophication, phytoplankton, and paleolimnology — all of which are rooted in the natural sciences. This pattern is consistent with the broader observation that small lake research has historically been concentrated in ecological and hydrological dimensions, with comparatively little attention given to the social and economic aspects of these water bodies.

This gap represents a significant opportunity for future research. Small lakes, despite being numerically dominant and biologically active, remain underexplored in terms of their socio-economic contributions to surrounding communities — including their roles in supporting local livelihoods, fisheries, aquaculture, and rural economies. Future studies that bridge the natural and social sciences in the context of small lake research would contribute meaningfully to a more holistic understanding of these ecosystems and their value to human communities.