| Type: | Package | 
| Title: | Access to CAPES Data | 
| Version: | 0.1.0 | 
| Date: | 2024-12-17 | 
| Description: | Provides simplified access to the data from the Catalog of Theses and Dissertations of the Brazilian Coordination for the Improvement of Higher Education Personnel (CAPES, https://catalogodeteses.capes.gov.br) for the years 1987 through 2022. The dataset includes variables such as Higher Education Institution (institution), Area of Concentration (area), Graduate Program Name (program_name), Type of Work (type), Language of Work (language), Author Identification (author), Abstract (abstract), Advisor Identification (advisor), Development Region (region), State (state). | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Depends: | R (≥ 4.0.0) | 
| RoxygenNote: | 7.3.2 | 
| Imports: | arrow, dplyr, magrittr, rlang, stringr, utils | 
| Suggests: | knitr, rmarkdown | 
| VignetteBuilder: | knitr | 
| URL: | <https://github.com/hugoavmedeiros/capesR> | 
| Config/Needs/website: | tidyverse/tidytemplate | 
| NeedsCompilation: | no | 
| Packaged: | 2024-12-18 16:37:22 UTC; leite | 
| Author: | Hugo Vasconcelos Medeiros [aut, cre], Dalson Figueiredo Filho [aut], André Leite [aut] | 
| Maintainer: | Hugo Vasconcelos Medeiros <hugo.medeiros@ufpe.br> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-12-19 16:20:02 UTC | 
Synthetic CAPES Data
Description
Aggregated data from the CAPES Catalog of Theses and Dissertations, containing summarized information by year, institution, area, program, type, region, and state (UF).
Usage
capes_synthetic_df
Format
A data frame with the following columns:
- base_year
- Reference year of the data. 
- institution
- Higher Education Institution. 
- area
- Area of Concentration. 
- program_name
- Name of the Graduate Program. 
- type
- Type of work (e.g., Master's, Doctorate). 
- region
- Region of Brazil. 
- state
- Federative Unit (state). 
- n
- Total number of works. 
Source
Synthetic data created from the CAPES Catalog of Theses and Dissertations.
Examples
data(capes_synthetic_df)
head(capes_synthetic_df)
Download CAPES Data
Description
Downloads CAPES theses and dissertations data files from OSF for selected years.
Usage
download_capes_data(years, destination = tempdir(), timeout = 120)
baixar_dados_capes(years, destination = tempdir(), timeout = 120)
Arguments
| years | A vector with the desired years. | 
| destination | The directory where the files will be saved (default: temporary directory). | 
| timeout | The timeout in seconds for the download process (default: 120 seconds). | 
Value
A list of file paths for the downloaded or already existing files.
Examples
# Download data for the years 1987 and 1990
capes_files <- download_capes_data(c(1987, 1990))
Read and filter data from the CAPES Catalog of Theses and Dissertations
Description
This function combines data from multiple Parquet files and applies optional filters, including text-based searches.
Usage
read_capes_data(files, filters = list())
ler_dados_capes(files, filters = list())
Arguments
| files | A vector or list of paths to Parquet files. | 
| filters | A list of filters to apply (e.g., list(base_year = 1987, state = "SP", title = "education")). | 
Value
A 'data.frame' containing the combined and filtered data.
Examples
# Download data for the years 1987 and 1990
capes_files <- download_capes_data(c(1987, 1990))
# Combine all selected data
combined_data <- read_capes_data(capes_files)
Search for terms in text fields of the CAPES Catalog of Theses and Dissertations data
Description
This function allows searching for specific terms in the text fields of a previously loaded 'data.frame'.
Usage
search_capes_text(data, term, field)
buscar_texto_capes(data, term, field)
Arguments
| data | A 'data.frame' containing the CAPES Catalog of Theses and Dissertations data. | 
| term | A string, the term to search for. | 
| field | A string, the name of the field to search in (e.g., "resumo", "titulo"). | 
Value
A 'data.frame' with rows matching the search or a message indicating no results were found.
Examples
# Download data for the years 1987 and 1990
capes_files <- download_capes_data(c(1987, 1990))
# Combine all selected data
combined_data <- read_capes_data(capes_files)
# Search data
results <- search_capes_text(
data = combined_data,
term = "Educação",
  field = "titulo"
)
Identifiers (IDs) on OSF for the annual data of the Catalog of Theses and Dissertations from the Brazilian Coordination for the Improvement of Higher Education Personnel (CAPES)
Description
A data frame containing the years and the corresponding IDs for downloading the files.
Usage
years_osf
Format
A data frame with the following columns:
- year
- Year of the data (1987-2022). 
- osf_id
- OSF ID corresponding to the year. 
Source
Examples
data(years_osf)
head(years_osf)