Type: Package
Title: A Curated Collection of Pulmonary and Respiratory Disease Datasets
Version: 0.1.0
Maintainer: Renzo Caceres Rossi <arenzocaceresrossi@gmail.com>
Description: Provides a comprehensive and curated collection of datasets related to the lungs, respiratory system, and associated diseases. This package includes epidemiological, clinical, experimental, and simulated datasets on conditions such as lung cancer, asthma, Chronic Obstructive Pulmonary Disease (COPD), tuberculosis, whooping cough, pneumonia, influenza, and other respiratory illnesses. It is designed to support data exploration, statistical modeling, teaching, and research in pulmonary medicine, public health, environmental epidemiology, and respiratory disease surveillance.
License: GPL-3
Language: en
URL: https://github.com/lightbluetitan/pulmodatasets, https://lightbluetitan.github.io/pulmodatasets/
BugReports: https://github.com/lightbluetitan/pulmodatasets/issues
Encoding: UTF-8
LazyData: true
Depends: R (≥ 4.1.0)
Imports: utils
Suggests: ggplot2, dplyr, testthat (≥ 3.0.0), knitr, rmarkdown
RoxygenNote: 7.3.2
Config/testthat/edition: 3
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2025-05-31 04:23:57 UTC; renzocrossi
Author: Renzo Caceres Rossi [aut, cre]
Repository: CRAN
Date/Publication: 2025-06-03 13:00:09 UTC

PulmoDataSets: A Curated Collection of Pulmonary and Respiratory Disease Datasets

Description

This package provides a wide variety of datasets focused on the lungs, respiratory system, tuberculosis, whooping cough, pneumonia, influenza and associated diseases.

Details

PulmoDataSets: A Curated Collection of Pulmonary and Respiratory Disease Datasets

logo

A Curated Collection of Pulmonary and Respiratory Disease Datasets.

Author(s)

Maintainer: Renzo Caceres Rossi arenzocaceresrossi@gmail.com

See Also

Useful links:


UK Female Lung Disease Deaths

Description

This dataset, UK_female_lung_deaths_ts, is a time series object containing monthly deaths from bronchitis, emphysema and asthma in the UK from 1974 to 1979, for females.

Usage

data(UK_female_lung_deaths_ts)

Format

A time series (ts) object with 72 monthly observations from 1974 to 1979.

value

Number of deaths (numeric vector)

time

Time index (1974 to 1979)

Details

The dataset name has been kept as 'UK_female_lung_deaths_ts' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'ts' indicates that the dataset is a time series object. The original content has not been modified in any way.

Source

Data taken from the datasets package (R version 4.5.0), fdeaths dataset


UK Male Lung Disease Deaths

Description

This dataset, UK_male_lung_deaths_ts, is a time series object containing monthly deaths from bronchitis, emphysema and asthma in the UK from 1974 to 1979, for males.

Usage

data(UK_male_lung_deaths_ts)

Format

A time series (ts) object with 72 monthly observations from 1974 to 1979.

value

Number of deaths (numeric vector)

time

Time index (1974 to 1979)

Details

The dataset name has been kept as 'UK_male_lung_deaths_ts' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'ts' indicates that the dataset is a time series object. The original content has not been modified in any way.

Source

Data taken from the datasets package (R version 4.5.0), mdeaths dataset


US Mortality Rates by Cause and Gender

Description

This dataset, USMortality_df, is a data frame containing mortality rates across all ages in the USA by cause of death, sex, rural and urban status from 2011 to 2013. The data represent national aggregate rates under the Department of Health and Human Services (HHS).

Usage

data(USMortality_df)

Format

A data frame with 40 observations and 5 variables:

Status

Rural/Urban status (factor with 2 levels)

Sex

Gender (factor with 2 levels)

Cause

Cause of death (factor with 10 levels)

Rate

Mortality rate (numeric vector)

SE

Standard error of mortality rate (numeric vector)

Details

The dataset name has been kept as 'USMortality_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a standard data frame. The original content has not been modified in any way.

Source

Data taken from the lattice package version 0.22-6


US Regional Mortality Rates by Cause and Gender

Description

This dataset, USRegionalMortality_df, is a data frame containing region-wise mortality rates across all ages in the USA by cause of death, sex, rural and urban status from 2011 to 2013. The data represent rates for each administrative region under the Department of Health and Human Services (HHS).

Usage

data(USRegionalMortality_df)

Format

A data frame with 400 observations and 6 variables:

Region

HHS administrative region (factor with 10 levels)

Status

Rural/Urban status (factor with 2 levels)

Sex

Gender (factor with 2 levels)

Cause

Cause of death (factor with 10 levels)

Rate

Mortality rate (numeric vector)

SE

Standard error of mortality rate (numeric vector)

Details

The dataset name has been kept as 'USRegionalMortality_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.

Source

Data taken from the lattice package version 0.22-6


AI Assessment of Pulmonary Nodules

Description

This dataset, ai_ipn_performance_dt, is a data table containing performance metrics of an artificial intelligence tool for risk stratification of 200 indeterminate pulmonary nodules (IPNs) on chest CT scans.

Usage

data(ai_ipn_performance_dt)

Format

A data table with 200 observations and 2 variables:

cancer

Malignancy status (0 = benign, 1 = malignant) (integer)

rating

AI risk assessment rating (integer)

Details

The dataset name has been kept as 'ai_ipn_performance_dt' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'dt' indicates that this is a data table object. The original content has not been modified in any way.

Source

Data taken from the R4HCR package version 0.1


Air Pollution and Mortality

Description

This dataset, air_polution_mortality_df, is a data frame containing information from an early study exploring the relationship between air pollution and mortality across 5 Standard Metropolitan Statistical Areas in the U.S. between 1959 and 1961.

Usage

data(air_polution_mortality_df)

Format

A data frame with 60 observations and 7 variables:

City

Metropolitan area (factor with 60 levels)

Mort

Mortality rate (numeric vector)

Precip

Annual precipitation in inches (integer vector)

Educ

Median years of education (numeric vector)

NonWhite

Percentage of non-white population (numeric vector)

NOX

Nitrogen oxide concentration (integer vector)

SO2

Sulfur dioxide concentration (integer vector)

Details

The dataset name has been kept as 'air_polution_mortality_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.

Source

Data taken from the Sleuth3 package version 1.0-6


COPD and Asthma Patients

Description

This dataset, asthma_patients_tbl_df, is a tibble containing clinical information about 300 asthma (COPD) patients tracked over 3 years, including demographics, smoking status, diagnosis details, medications, and peak flow measurements.

Usage

data(asthma_patients_tbl_df)

Format

A tibble with 300 observations and 7 variables:

Patient_ID

Unique patient identifier (numeric)

Age

Patient age in years (numeric)

Gender

Patient gender (character)

Smoking_Status

Current/Former/Never smoker status (character)

Asthma_Diagnosis

Specific asthma/COPD diagnosis (character)

Medication

Prescribed treatment regimen (character)

Peak_Flow

Peak expiratory flow rate (numeric)

Details

The dataset name has been kept as 'asthma_patients_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'tbl_df' indicates that the dataset is a tibble object. The original content has not been modified in any way.

Source

Data taken from Kaggle: https://www.kaggle.com/datasets/jatinthakur706/copd-asthma-patient-dataset


Chronic Bronchitis in Cardiff Men

Description

This dataset, bronchitis_Cardiff_df, is a data frame containing information from a study assessing the effects of smoking and pollution on bronchitis diagnosis in a sample of 212 men from Cardiff.

Usage

data(bronchitis_Cardiff_df)

Format

A data frame with 212 observations and 4 variables:

cig

Number of cigarettes smoked per day (numeric)

poll

Pollution exposure level (numeric)

r

Bronchitis diagnosis (0 = no, 1 = yes) (integer)

rfac

Bronchitis diagnosis as a factor with 2 levels (factor)

Details

The dataset name has been kept as 'bronchitis_Cardiff_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.

Source

Data taken from the gamclass package version 0.62.5


Chicago Mortality and Pollution

Description

This dataset, chicago_pollution_df, is a data frame containing daily mortality, weather, and pollution data for Chicago from 1987 to 2000 from the National Morbidity, Mortality and Air Pollution Study (NMMAPS). It includes all-cause mortality, cardiovascular and respiratory deaths, temperature, humidity, and pollution levels (PM10 and ozone).

Usage

data(chicago_pollution_df)

Format

A data frame with 5114 observations and 14 variables:

date

Date (Date object)

time

Time index (integer vector)

year

Year (numeric vector)

month

Month (numeric vector)

doy

Day of year (integer vector)

dow

Day of week (factor with 7 levels)

death

All-cause mortality count (integer vector)

cvd

Cardiovascular mortality count (integer vector)

resp

Respiratory mortality count (integer vector)

temp

Temperature (numeric vector)

dptp

Dew point temperature (numeric vector)

rhum

Relative humidity (numeric vector)

pm10

PM10 pollution level (numeric vector)

o3

Ozone level (numeric vector)

Details

The dataset name has been kept as 'chicago_pollution_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a standard data frame. The original content has not been modified in any way.

Source

Data taken from the dlnm package version 2.4.10


Child Wheeze and Pollution

Description

This dataset, child_wheeze_pollution_df, is a data frame containing longitudinal data on wheezing status for 16 children measured four times yearly at ages 9 through 12, with associated pollution exposure information.

Usage

data(child_wheeze_pollution_df)

Format

A data frame with 64 observations and 5 variables:

ID

Child identifier (integer vector)

Wheeze

Wheezing status (integer vector)

City

City identifier (integer vector)

Age

Child's age in years (integer vector)

Smoke

Smoking exposure indicator (integer vector)

Details

The dataset name has been kept as 'child_wheeze_pollution_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.

Source

Data taken from the geessbin package version 1.0.0


Children Respiratory Rates Data

Description

This dataset, children_respiratory_rates_df, is a data frame containing respiratory rate measurements from 618 Italian children aged between 15 days and 3 years, collected to establish normal respiratory rate distributions for clinical assessment.

Usage

data(children_respiratory_rates_df)

Format

A data frame with 618 observations and 2 variables:

Age

Child's age in days (numeric vector)

Rate

Respiratory rate in breaths per minute (integer vector)

Details

The dataset name has been kept as 'children_respiratory_rates_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.

Source

Data taken from the Sleuth3 package version 1.0-6


Lung cancer in 4 Danish cities 1968-71

Description

This dataset, danish_lung_incidence_df, is a data frame containing counts of incident lung cancer cases and population size in four neighbouring Danish cities by age group from 1968 to 1971.

Usage

data(danish_lung_incidence_df)

Format

A data frame with 24 observations and 4 variables:

city

City of observation (factor with 4 levels)

age

Age group (factor with 6 levels)

pop

Population size (integer)

cases

Number of incident lung cancer cases (integer)

Details

The dataset name has been kept as 'danish_lung_incidence_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.

Source

Data taken from the ISwR package version 2.0-10


UK lung and nasal cancer deaths 1936–80

Description

This dataset, engwales_cancer_mortality_df, is a data frame containing England and Wales mortality rates from lung cancer, nasal cancer, and all causes between 1936 and 1980. The 1936 rates are repeated as 1931 rates in order to accommodate follow-up for the nickel study.

Usage

data(engwales_cancer_mortality_df)

Format

A data frame with 150 observations and 5 variables:

year

Year of observation (numeric)

age

Age group (numeric)

lung

Lung cancer mortality rate (numeric)

nasal

Nasal cancer mortality rate (numeric)

other

Mortality rate from all other causes (numeric)

Details

The dataset name has been kept as 'engwales_cancer_mortality_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.

Source

Data taken from the ISwR package version 2.0-10


US 1975-76 Influenza-Like Illness Data

Description

This dataset, influenza_us_1975_df, is a data frame containing influenza-like illness (ILI) data for the lower 48 US states and District of Columbia during the 1975-76 season, which was dominated by the A H3N2 Victoria strain.

Usage

data(influenza_us_1975_df)

Format

A data frame with 49 observations (states + DC) and 7 variables:

State

State identifier (integer)

Acronym

State abbreviation (factor with 51 levels)

Pop

State population (integer)

Latitude

Geographic latitude (numeric)

Longitude

Geographic longitude (numeric)

Start

Week of season start (integer)

Peak

Week of peak activity (integer)

Details

The dataset name has been kept as 'influenza_us_1975_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a standard data frame. The original content has not been modified in any way.

Source

Data taken from the epimdr package version 0.6-5


Lung Cancer Survival Data

Description

This dataset, lung_cancer_survival_df, is a data frame containing survival information for 228 lung cancer patients, with 10 clinical variables including survival time, patient status, age, gender, performance scores, and nutritional indicators.

Usage

data(lung_cancer_survival_df)

Format

A data frame with 228 observations (patients) and 10 variables:

inst

Institution code where patient was treated (numeric)

time

Survival time in days from diagnosis (numeric)

status

Censoring status (1 = censored, 2 = died) (numeric)

age

Patient age at diagnosis in years (numeric)

sex

Gender (1 = male, 2 = female) (numeric)

ph.ecog

ECOG performance score (0=asymptomatic to 4=fully disabled) (numeric)

ph.karno

Karnofsky performance score (0-100) as rated by physician (numeric)

pat.karno

Karnofsky performance score (0-100) as self-reported by patient (numeric)

meal.cal

Daily calories consumed at meals (numeric)

wt.loss

Weight loss in last six months (pounds) (numeric)

Details

The dataset name has been kept as 'lung_cancer_survival_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.

Source

Data taken from the acro package version 0.1.4


Incidental or Screen-Detected Lung Nodules

Description

This dataset, lung_nodules_detection_dt, is a data table containing clinical and radiological characteristics of 999 pulmonary nodules (up to 15mm in size) detected on routine chest CT scans from 3 UK academic centers.

Usage

data(lung_nodules_detection_dt)

Format

A data table with 999 observations and 8 variables:

sex

Patient sex (factor with 2 levels)

age

Patient age in years (numeric)

num.annotated

Number of annotated nodules (numeric)

location

Nodule location (factor with 6 levels)

spiculate

Spiculation status (factor with 2 levels)

smoke.status

Smoking history (factor with 5 levels)

diameter

Nodule diameter in mm (numeric)

malignant

Malignancy status (0=benign, 1=malignant) (numeric)

Details

The dataset name has been kept as 'lung_nodules_detection_dt' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'dt' indicates that this is a data table object. The original content has not been modified in any way.

Source

Data taken from the R4HCR package version 0.1


Male Lung Cancer by Smoking Duration

Description

This dataset, lungca_cancer_deaths_df, is a data frame containing data on man-years of smoking risk and observed lung cancer deaths among male smokers. It includes 63 observations across 4 variables measuring smoking exposure and mortality outcomes.

Usage

data(lungca_cancer_deaths_df)

Format

A data frame with 63 observations and 4 variables:

yrs_smk

Years of smoking (factor with 9 levels)

pys

Person-years of smoking exposure (numeric)

num_cigs

Number of cigarettes smoked daily (factor with 7 levels)

deaths

Number of lung cancer deaths (numeric)

Details

The dataset name has been kept as 'lungca_cancer_deaths_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a standard data frame. The original content has not been modified in any way.

Source

Data taken from the R4HCR package version 0.1


Neonatal Intubation Simulation

Description

This dataset, neonatal_intubation_times_df, is a data frame containing execution times (in seconds) for specific actions performed by 37 midwife students during a high-fidelity neonatal resuscitation simulation. The simulation was video recorded, and each critical action in the intubation process was tagged for timing analysis.

Usage

data(neonatal_intubation_times_df)

Format

A data frame with 37 observations and 7 variables:

id

Participant ID (integer)

deci_intub

Time to decision to intubate (seconds) (integer)

stop_ventil

Time to stop ventilation (seconds) (integer)

blade_in

Time to insert laryngoscope blade (seconds) (integer)

insert_tube

Time to insert endotracheal tube (seconds) (integer)

blade_out

Time to remove laryngoscope blade (seconds) (integer)

restart_ventil

Time to restart ventilation (seconds) (integer)

Details

The dataset name has been kept as 'neonatal_intubation_times_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.

Source

Data taken from the ViSiElse package version 1.2.2


Nicotine Gum and Smoking Cessation

Description

This dataset, nicotine_gum_df, is a data frame containing meta-analysis data on the effectiveness of nicotine gum for smoking cessation across 26 studies.

Usage

data(nicotine_gum_df)

Format

A data frame with 26 observations (studies) and 4 variables:

qt

Number of successful quitters in treatment group (integer)

tt

Total participants in treatment group (integer)

qc

Number of successful quitters in control group (integer)

tc

Total participants in control group (integer)

Details

The dataset name has been kept as 'nicotine_gum_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.

Source

Data taken from the HSAUR3 package version 1.0-15


Ohio Children Wheeze Status

Description

This dataset, ohio_children_wheeze_df, is a data frame containing wheeze status data from 2148 observations of children in Ohio. The data are part of a subset from the Six-City Study, a longitudinal study examining the health effects of air pollution on children.

Usage

data(ohio_children_wheeze_df)

Format

A data frame with 2148 observations and 4 variables:

resp

Wheeze status (0 = no wheeze, 1 = wheeze) (integer)

id

Child identifier (integer)

age

Age of the child in years (integer)

smoke

Parental smoking status (0 = no, 1 = yes) (integer)

Details

The dataset name has been kept as 'ohio_children_wheeze_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.

Source

Data taken from the geepack package version 1.3.12


Lung Disease Patients

Description

This dataset, patients_lung_diseases_tbl_df, is a tibble containing detailed clinical information about 5,200 patients with various lung conditions, including demographics, smoking status, lung capacity measurements, disease types, treatments received, hospital visits, and recovery status.

Usage

data(patients_lung_diseases_tbl_df)

Format

A tibble with 5,200 observations and 8 variables:

Age

Patient age in years (numeric)

Gender

Patient gender (character)

Smoking Status

Smoker or non-smoker status (character)

Lung Capacity

Measured lung function (numeric)

Disease Type

Specific lung condition (character)

Treatment Type

Therapy, medication or surgery received (character)

Hospital Visits

Number of hospital visits (numeric)

Recovered

Recovery status (character)

Details

The dataset name has been kept as 'patients_lung_diseases_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'tbl_df' indicates that the dataset is a tibble object. The original content has not been modified in any way.

Source

Data taken from Kaggle: https://www.kaggle.com/datasets/samikshadalvi/lungs-diseases-dataset


Monthly Pneumonia and Influenza Deaths in the U.S.

Description

This dataset, pneumonia_influenza_ts, is a time series containing monthly rates of pneumonia and influenza deaths in the United States from 1968 to 1978.

Usage

data(pneumonia_influenza_ts)

Format

A time series with 132 monthly observations from January 1968 to December 1978:

Value

Mortality rate (numeric vector)

Time

Monthly index from 1968 to 1978 (time series vector)

Details

The dataset name has been kept as 'pneumonia_influenza_ts' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'ts' indicates that the dataset is a time series. The original content has not been modified in any way.

Source

Data taken from the astsa package version 2.2


Respiratory Clinical Trial

Description

This dataset, respiratory_clinical_trial_df, is a data frame containing information from a clinical trial of patients with respiratory illness, where 111 patients from two different clinics were randomized to receive either placebo or an active treatment. Patients were examined at baseline and at four visits during treatment. The respiratory status was determined at each visit, with 1 representing good status and 0 representing poor status.

Usage

data(respiratory_clinical_trial_df)

Format

A data frame with 444 observations and 8 variables:

center

Study identifier (integer vector)

id

Patient identifier (integer vector)

treat

Treatment group (factor with 2 levels)

sex

Patient sex (factor with 2 levels)

age

Patient age in years (integer vector)

baseline

Baseline respiratory status (integer vector)

visit

Visit number (integer vector)

outcome

Respiratory status (integer vector)

Details

The dataset name has been kept as 'respiratory_clinical_trial_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.

Source

Data taken from the geepack package version 1.3.12


Azithromycin for Respiratory Infections

Description

This dataset, respiratory_infections_df, is a data frame containing results from 15 clinical trials comparing the effectiveness of azithromycin versus amoxycillin or amoxycillin/clavulanic acid (amoxyclav) in the treatment of acute lower respiratory tract infections.

Usage

data(respiratory_infections_df)

Format

A data frame with 15 observations and 11 variables:

author

Study author(s) (character vector)

year

Year of publication (integer vector)

ai

Number of successful treatments in azithromycin group (integer vector)

n1i

Total number of participants in azithromycin group (integer vector)

ci

Number of successful treatments in control group (integer vector)

n2i

Total number of participants in control group (integer vector)

age

Patient age characteristics (character vector)

diag.ab

Number diagnosed with acute bronchitis (integer vector)

diag.cb

Number diagnosed with chronic bronchitis (integer vector)

diag.pn

Number diagnosed with pneumonia (integer vector)

ctrl

Type of control treatment (character vector)

Details

The dataset name has been kept as 'respiratory_infections_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.

Source

Data taken from the metadat package version 1.4-0


Respiratory Illness Clinical Trial

Description

This dataset, respiratory_trial_df, is a data frame containing the respiratory status of patients recruited for a randomized clinical multicenter trial, with 555 observations across 111 subjects.

Usage

data(respiratory_trial_df)

Format

A data frame with 555 observations and 7 variables:

centre

Study center (factor with 2 levels)

treatment

Treatment group (factor with 2 levels)

gender

Patient gender (factor with 2 levels)

age

Patient age in years (numeric)

status

Respiratory status (factor with 2 levels)

month

Follow-up month (ordered factor with 5 levels)

subject

Patient identifier (factor with 111 levels)

Details

The dataset name has been kept as 'respiratory_trial_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a standard data frame. The original content has not been modified in any way.

Source

Data taken from the HSAUR3 package version 1.0-15


Ordinal respiratory outcomes

Description

This dataset, respiratory_trial_outcomes_df, is a data frame containing outcome data from a randomized clinical trial described in Miller et al. (1993) evaluating a new treatment for respiratory disorder. The study includes 111 patients who were randomly assigned to one of two treatments (active or placebo). The patients were followed up at four visits, and their response status was classified on an ordinal scale at each visit.

Usage

data(respiratory_trial_outcomes_df)

Format

A data frame with 111 observations and 5 variables:

y1

Ordinal response at visit 1 (integer)

y2

Ordinal response at visit 2 (integer)

y3

Ordinal response at visit 3 (integer)

y4

Ordinal response at visit 4 (integer)

trt

Treatment group (0 = placebo, 1 = active) (integer)

Details

The dataset name has been kept as 'respiratory_trial_outcomes_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.

Source

Data taken from the geepack package version 1.3.12


UK Smoking Habits

Description

This dataset, smoking_UK_tbl_df, is a tibble containing survey data on smoking habits from the UK, with demographic characteristics and tobacco consumption patterns from 1,691 respondents.

Usage

data(smoking_UK_tbl_df)

Format

A tibble with 1,691 observations and 12 variables:

gender

Gender of respondent (factor with 2 levels)

age

Age in years (integer)

marital_status

Marital status (factor with 5 levels)

highest_qualification

Highest education qualification (factor with 8 levels)

nationality

Nationality (factor with 8 levels)

ethnicity

Ethnic group (factor with 7 levels)

gross_income

Income bracket (factor with 10 levels)

region

UK region (factor with 7 levels)

smoke

Smoking status (factor with 2 levels)

amt_weekends

Cigarettes smoked on weekends (integer)

amt_weekdays

Cigarettes smoked on weekdays (integer)

type

Type of tobacco used (factor with 5 levels)

Details

The dataset name has been kept as 'smoking_UK_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'tbl_df' indicates that this is a tibble data frame. The original content has not been modified in any way.

Source

Data taken from the openintro package version 2.5.0


Smoking Deaths Among Doctors (British)

Description

This dataset, smoking_doctors_df, is a data frame containing data from a study on smoking habits and coronary artery disease mortality among British doctors. It includes 10 observations across 5 variables representing person-years of observation and deaths during the study period.

Usage

data(smoking_doctors_df)

Format

A data frame with 10 observations and 5 variables:

age

Age group (factor with 5 levels)

smoke

Smoking status (numeric)

n

Number of person-years at risk (numeric)

y

Number of deaths from coronary artery disease (numeric)

ns

Standardized mortality ratio (numeric)

Details

The dataset name has been kept as 'smoking_doctors_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a standard data frame. The original content has not been modified in any way.

Source

Data taken from the boot package version 1.3-31


Smoking and Lung Cancer

Description

This dataset, smoking_lung_cancer_df, is a data frame containing data from a retrospective case-control study comparing smoking status between 86 lung cancer patients and 86 controls.

Usage

data(smoking_lung_cancer_df)

Format

A data frame with 2 observations and 3 variables:

Smoking

Smoking status (factor with 2 levels: "NonSmokers", "Smokers")

Cancer

Number of lung cancer cases (integer vector)

Control

Number of control cases (integer vector)

Details

The dataset name has been kept as 'smoking_lung_cancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.

Source

Data taken from the Sleuth3 package version 1.0-6


Youth Smoking and Lung Function

Description

This dataset, smoking_youth_tbl_df, is a tibble containing data from the Childhood Respiratory Disease Study collected in the late 1970s, examining the effects of smoking and second-hand smoke exposure on pulmonary function in 654 youths.

Usage

data(smoking_youth_tbl_df)

Format

A tibble with 654 observations and 5 variables:

age

Age in years (integer)

FEV

Forced Expiratory Volume in liters (numeric)

height

Height in centimeters (numeric)

sex

Sex of participant (character)

smoker

Smoking status (character)

Details

The dataset name has been kept as 'smoking_youth_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'tbl_df' indicates that this is a tibble data frame. The original content has not been modified in any way.

Source

Data taken from the LSTbook package version 0.6


Total Lung Capacity

Description

This dataset, tlc_lung_capacity_df, is a data frame containing data on pretransplant total lung capacity (TLC) measured by whole-body plethysmography for recipients of heart-lung transplants.

Usage

data(tlc_lung_capacity_df)

Format

A data frame with 32 observations and 4 variables:

age

Age in years (integer)

sex

Sex (0 = female, 1 = male) (integer)

height

Height in centimeters (integer)

tlc

Total lung capacity in liters (numeric)

Details

The dataset name has been kept as 'tlc_lung_capacity_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.

Source

Data taken from the ISwR package version 2.0-10


BCG Vaccine Against Tuberculosis

Description

This dataset, tuberculosis_vaccine_df, is a data frame containing results from 13 clinical trials examining the effectiveness of the Bacillus Calmette-Guerin (BCG) vaccine against tuberculosis.

Usage

data(tuberculosis_vaccine_df)

Format

A data frame with 13 observations and 9 variables:

trial

Trial identifier number (integer vector)

author

Study author(s) (character vector)

year

Year of publication (integer vector)

tpos

Number of TB positive cases in vaccinated group (integer vector)

tneg

Number of TB negative cases in vaccinated group (integer vector)

cpos

Number of TB positive cases in control group (integer vector)

cneg

Number of TB negative cases in control group (integer vector)

ablat

Absolute latitude of study location (integer vector)

alloc

Method of treatment allocation (character vector)

Details

The dataset name has been kept as 'tuberculosis_vaccine_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.

Source

Data taken from the metadat package version 1.4-0


Veterans Administration Lung Cancer Study

Description

This dataset, veterans_lung_cancer_df, is a data frame containing information from a randomized trial of two treatment regimens for lung cancer. This is a standard survival analysis data set.

Usage

data(veterans_lung_cancer_df)

Format

A data frame with 137 observations and 8 variables:

trt

Treatment group (numeric)

celltype

Cell type (factor with 4 levels)

time

Survival time in days (numeric)

status

Censoring status (numeric)

karno

Karnofsky performance score (numeric)

diagtime

Time from diagnosis to randomization (numeric)

age

Age in years (numeric)

prior

Number of prior therapies (numeric)

Details

The dataset name has been kept as 'veterans_lung_cancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.

Source

Data taken from the survival package version 3.8-3


View Available Datasets in PulmoDataSets

Description

This function lists all datasets available in the 'PulmoDataSets' package. If the 'PulmoDataSets' package is not loaded, it stops and shows an error message. If no datasets are available, it returns a message and an empty vector.

Usage

view_datasets_pulmo()

Value

A character vector with the names of the available datasets. If no datasets are found, it returns an empty character vector.

Examples

if (requireNamespace("PulmoDataSets", quietly = TRUE)) {
  library(PulmoDataSets)
  view_datasets_pulmo()
}

Copenhagen Whooping Cough 1900-1937

Description

This dataset, whooping_cough_dk_df, is a data frame containing weekly incidence data of whooping cough in Copenhagen, Denmark between January 1900 and December 1937. It includes 1,982 weekly observations across 8 demographic and epidemiological variables.

Usage

data(whooping_cough_dk_df)

Format

A data frame with 1,982 weekly observations and 8 variables:

date

Date of observation (factor)

births

Number of births (integer)

day

Day of month (integer)

month

Month (integer 1-12)

year

Year (integer 1900-1937)

cases

Number of whooping cough cases (integer)

deaths

Number of whooping cough deaths (integer)

popsize

Population size (numeric)

Details

The dataset name has been kept as 'whooping_cough_dk_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a standard data frame. The original content has not been modified in any way.

Source

Data taken from the epimdr package version 0.6-5


Philadelphia Whooping Cough 1925-1947

Description

This dataset, whooping_cough_phila_df, is a data frame containing weekly incidence data of whooping cough in Philadelphia between 1925 and 1947, with 1,200 weekly observations across 5 variables.

Usage

data(whooping_cough_phila_df)

Format

A data frame with 1,200 weekly observations and 5 variables:

YEAR

Year of observation (integer)

WEEK

Week number (integer)

PHILADELPHIA

Weekly incidence count of whooping cough cases (integer)

TIME

Time index (numeric)

TM

Time marker (integer)

Details

The dataset name has been kept as 'whooping_cough_phila_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a standard data frame. The original content has not been modified in any way.

Source

Data taken from the epimdr package version 0.6-5


Whooping Cough Deaths in London (1740-1881)

Description

This dataset, whooping_cough_ts, is a time series object containing annual counts of deaths from whooping cough in London from 1740 to 1881, with three measurement variables recorded each year.

Usage

data(whooping_cough_ts)

Format

A multivariate time series with 142 annual observations from 1740 to 1881 and 3 variables:

wcough

Number of whooping cough deaths (integer)

ratio

Death ratio (numeric)

alldeaths

Total deaths from all causes (integer)

Details

The dataset name has been kept as 'whooping_cough_ts' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'ts' indicates that this is a time series object. The original content has not been modified in any way.

Source

Data taken from the DAAG package version 1.25.6