The SGPdata package contains 4 examplar data set for
use with student growth percentile (SGP)
analyses. One of the data sets, sgpData, specifies data in
the WIDE format that’s used with the lower level SGP functions
studentGrowthPercentiles and
studentGrowthProjections. Two of the data sets,
sgpData_LONG and sgptData_LONG specify data in
the LONG format used by higher level functions like abcSGP,
prepareSGP, and analyzeSGP. The last data set,
sgpData_INSTRUCTOR_NUMBER is a teacher-student lookup table
utilized to produce teacher level aggregates. The sections that follow
discuss each of the 4 data sets in greater depth.
The data set sgpData is an anonymized, panel data set
comprisong 5 years of annual, vertically scaled, assessment data in WIDE
format. This exemplar data set models the format for data used with the
lower level studentGrowthPercentiles
and studentGrowthProjections
functions.
> head(sgpData)
Key: <ID>
        ID GRADE_2020 GRADE_2021 GRADE_2022 GRADE_2023 GRADE_2024 SS_2020 SS_2021 SS_2022 SS_2023 SS_2024
     <int>      <num>      <num>      <num>      <num>      <num>   <num>   <num>   <num>   <num>   <num>
1: 1000185         NA         NA         NA         NA          7      NA      NA      NA      NA     520
2: 1000486          3          4          5          6          7     524     548     607     592     656
3: 1000710          8         NA         NA         NA         NA     713      NA      NA      NA      NA
4: 1000715         NA         NA          4          5          6      NA      NA     469     492     551
5: 1000803         NA          5         NA         NA         NA      NA     558      NA      NA      NA
6: 1000957          5          6          7          8         NA     651     660     666     663      NAThe Wide data format illustrated by sgpData and utilized
by the SGP package can accomodate any number of occurrences but must
follow a specific column order. Variable names are irrelevant, position
in the data set is what’s important:
In sgpData above, the first column, ID,
provides the unique student identifier. The next 5 columns,
GRADE_2013, GRADE_2014, GRADE_2015,
GRADE_2016, and GRADE_2017, provide the grade level of
the student assessment score in each of the 5 years. The last 5 columns,
SS_2013, SS_2014, SS_2015, SS_2016,
and SS_2017, provide the scale scores associated with the
student in each of the 5 years. In most cases the student does not have
5 years of test data so the data shows the missing value (NA).
Using wide-format data like sgpData with the SGP package
is, in general, straight forward.
> sgp_g4 <- studentGrowthPercentiles(
        panel.data=sgpData,
        sgp.labels=list(my.year=2015, my.subject="Reading"),
        percentile.cuts=c(1,35,65,99),
        grade.progression=c(3,4))Please consult SGP package documentation
for more comprehensive documentation on how to use sgpData
for SGP calculations.
The data set sgpData_LONG is an anonymized, panel data
set comprising 5 years of annual, vertcially scaled, assessment data in
LONG format for two content areas (ELA and Mathematics). This exemplar
data set models the format for data used with the higher level functions
abcSGP,
prepareSGP,
analyzeSGP,
combineSGP,
summarizeSGP,
visualizeSGP,
and outputSGP
> head(sgpData_LONG)
   VALID_CASE CONTENT_AREA      YEAR      ID LAST_NAME FIRST_NAME  GRADE SCALE_SCORE    ACHIEVEMENT_LEVEL       GENDER ETHNICITY FREE_REDUCED_LUNCH_STATUS ELL_STATUS IEP_STATUS GIFTED_AND_TALENTED_PROGRAM_STATUS SCHOOL_NUMBER                  SCHOOL_NAME  EMH_LEVEL DISTRICT_NUMBER                DISTRICT_NAME SCHOOL_ENROLLMENT_STATUS DISTRICT_ENROLLMENT_STATUS STATE_ENROLLMENT_STATUS
       <char>       <char>    <char>  <char>    <fctr>     <fctr> <char>       <num>               <char>       <fctr>    <fctr>                    <fctr>     <fctr>     <fctr>                             <fctr>         <int>                       <fctr>     <fctr>           <int>                       <fctr>                   <fctr>                     <fctr>                  <fctr>
1: VALID_CASE  MATHEMATICS 2021_2022 1000372   Daniels      Corey      3         435           Proficient Gender: Male  Hispanic   Free Reduced Lunch: Yes   ELL: Yes    IEP: No    Gifted and Talented Program: No          1851 Silk-Royal Elementary School Elementary             470 Apple Valley School District     Enrolled School: Yes     Enrolled District: Yes     Enrolled State: Yes
2: VALID_CASE  MATHEMATICS 2022_2023 1000372   Daniels      Corey      4         461           Proficient Gender: Male  Hispanic   Free Reduced Lunch: Yes   ELL: Yes    IEP: No    Gifted and Talented Program: No          1851 Silk-Royal Elementary School Elementary             470 Apple Valley School District     Enrolled School: Yes     Enrolled District: Yes     Enrolled State: Yes
3: VALID_CASE  MATHEMATICS 2023_2024 1000372   Daniels      Corey      5         444 Partially Proficient Gender: Male  Hispanic   Free Reduced Lunch: Yes   ELL: Yes    IEP: No    Gifted and Talented Program: No          1851 Silk-Royal Elementary School Elementary             470 Apple Valley School District     Enrolled School: Yes     Enrolled District: Yes     Enrolled State: Yes
4: VALID_CASE      READING 2021_2022 1000372   Daniels      Corey      3         523 Partially Proficient Gender: Male  Hispanic   Free Reduced Lunch: Yes   ELL: Yes    IEP: No    Gifted and Talented Program: No          1851 Silk-Royal Elementary School Elementary             470 Apple Valley School District     Enrolled School: Yes     Enrolled District: Yes     Enrolled State: Yes
5: VALID_CASE      READING 2022_2023 1000372   Daniels      Corey      4         540 Partially Proficient Gender: Male  Hispanic   Free Reduced Lunch: Yes   ELL: Yes    IEP: No    Gifted and Talented Program: No          1851 Silk-Royal Elementary School Elementary             470 Apple Valley School District     Enrolled School: Yes     Enrolled District: Yes     Enrolled State: Yes
6: VALID_CASE      READING 2023_2024 1000372   Daniels      Corey      5         473       Unsatisfactory Gender: Male  Hispanic   Free Reduced Lunch: Yes   ELL: Yes    IEP: No    Gifted and Talented Program: No          1851 Silk-Royal Elementary School Elementary             470 Apple Valley School District     Enrolled School: Yes     Enrolled District: Yes     Enrolled State: YesWe recommend LONG formated data for use with operational analyses.
Managing data in long format is more simple than data in the wide
format. For example, when updating analyses with another year of data,
the data is appended onto the bottom of the currently existing long data
set. All higher level functions in the SGP package are designed for use
with LONG format data. In addition, these functions often assume the
existence of state specific meta-data in the embedded SGPstateData
meta-data. See the SGP package documentation](https://sgp.io) for more comprehensive documentation on
how to use sgpData for SGP calculations.
There are 7 required variables when using LONG data with SGP
analyses: VALID_CASE, CONTENT_AREA,
YEAR, ID, SCALE_SCORE,
GRADE and ACHIEVEMENT_LEVEL (on required if
running student growth projections). LAST_NAME and
FIRST_NAME are required if creating individual level
student growth and achievement plots. All other variables are
demographic/student categorization variables used for creating student
aggregates by the summarizeSGP
function.
The sgpData_LONG data set contains data for 5 years
across 2 content areas (ELA and Mathematics)
The data set sgptData_LONG is an anonymized, panel data
set comprising 8 windows (3 windows annually) of assessment data in LONG
format for 3 content areas (Early Literacy, Mathematics, and Reading).
This data set is similar to the sgpData_LONG data set
without the demographic variables and with an additional
DATE variable indicating the date associated with the
student assessment record.
> head(sgptData_LONG)
Key: <VALID_CASE, CONTENT_AREA, YEAR, ID, GRADE>
   VALID_CASE   CONTENT_AREA        YEAR        ID  GRADE       DATE SCALE_SCORE SCALE_SCORE_RASCH COUNTRY  STATE   SEM ACHIEVEMENT_LEVEL
       <char>         <char>      <char>    <char> <char>     <Date>       <num>             <num>  <char> <char> <num>            <char>
1: VALID_CASE EARLY_LITERACY 2014_2015.2  ANON_130    K.2 2015-01-14         622            0.3449      US     OH    55              <NA>
2: VALID_CASE EARLY_LITERACY 2014_2015.2 ANON_1314    1.2 2015-01-08         500           -0.6556      US     NJ    49              <NA>
3: VALID_CASE EARLY_LITERACY 2014_2015.2  ANON_133    K.2 2015-01-17         566           -0.1010      US     OH    57              <NA>
4: VALID_CASE EARLY_LITERACY 2014_2015.2 ANON_1429    2.2 2015-03-12         621            0.3368      US     WI    58              <NA>
5: VALID_CASE EARLY_LITERACY 2014_2015.2 ANON_1498    K.2 2015-01-09         577           -0.0129      US     IL    57              <NA>
6: VALID_CASE EARLY_LITERACY 2014_2015.2 ANON_1533    K.2 2015-01-23         443           -1.2131      US     IL    38              <NA>The data set sgpData_INSTRUCTOR_NUMBER is an anonymized,
student-instructor lookup table that provides insturctor information
associated with each students test record. Note that just as each
teacher can (and will) have more than 1 student associated with them, a
student can have more than one teacher associated with their test
record. That is, multiple teachers could be assigned to the student in a
single content area for a given year.
> head(sgpData_INSTRUCTOR_NUMBER)
        ID CONTENT_AREA      YEAR INSTRUCTOR_NUMBER INSTRUCTOR_LAST_NAME INSTRUCTOR_FIRST_NAME INSTRUCTOR_WEIGHT INSTRUCTOR_ENROLLMENT_STATUS
    <char>       <char>    <char>            <char>               <fctr>                <fctr>             <num>                       <fctr>
1: 1000372  MATHEMATICS 2020_2021         185103004                 Kang                Alexis               1.0     Enrolled Instructor: Yes
2: 1000372  MATHEMATICS 2021_2022         185104002                Mills                  Karl               1.0     Enrolled Instructor: Yes
3: 1000372  MATHEMATICS 2022_2023         185105002             Intavong               Michael               0.2     Enrolled Instructor: Yes
4: 1000372  MATHEMATICS 2022_2023         185105004                Price                 Angel               0.8     Enrolled Instructor: Yes
5: 1000372      READING 2020_2021         185103003               Mccord             Guadalupe               1.0     Enrolled Instructor: Yes
6: 1000372      READING 2021_2022         185104001               Rivera               Kailynn               0.7     Enrolled Instructor: YesIf you have a contribution or feature request for the SGPdata package, don’t hesitate to write or set up an issue on GitHub.