Statistics and Results Concepts

Stats Result Set

CE uses a Calculations process to generate Stats Result sets. A Stats Result set is all of the calculated values for an Exam answer key compared against a set or sets of Registrant responses registered to Exam Sessions. The Stats Result set contains Candidate performance against the Exam, Sections, Competencies as well as aggregated results for the Exam, Sections, Items and Competencies. The data in the Stats Result set are used by various reports and exports. The Stats Result set is stored in the database and can be over written by rerunning the Calculation process/

Statistics Scope

CE at its base level tracks each Candidates performance against each Item. This is the base scope of statistics CE calculates. CE then aggregates those Candidate-Item statistics to generate higher scoped statistics, like Candidate-Section statistics or Candidate-Exam statistics. CE also aggregates the Candidate statistics to create Exam scoped statistics such as Exam-Competency or Exam-Section statistics.

StatsID

For each calculation against an exam, a unique set of statistics is created and stored in the database. This unique set of statistics is keyed by the StatsID.

Include In Stats

Include in Stats is used to indicate whether a Candidate is auditing the Exam (false) or taking the Exam (true). If the Candidate is auditing the Exam, the Candidate has their statistics calculated, but their statistics are not aggregated up to Exam statistics. Include in Stats is also used to indicate whether an Item is experimental on the Exam (false) or Included on the Exam (true). If the Item is experimental on the Exam, the Item has its statistics calculated, but their statistics are not aggregated up to Exam statistics.

Statistics Types

CE Calculates two sets of statistics. 1) Raw statistics which are the counts of whether Candidates responded to an Item correctly, incorrectly or skipped the Item. 2) Value statistics which is the point totals allotted for answering the Item correctly, incorrectly or skipping the Item.

Raw Statistics

QtyCorrect

Quantity Correct is the count of how many Items are answered correctly in the scope of the statistic. For an individual Candidate for an individual Item, this value is either 0 or 1. For the scope of Candidate-Exam the value is 0 to number of Items on the Exam. This value is always a whole number. For Multiple Choice, responding to an Option that is flagged correct marks the Candidate-Item as correct. For Essay, achieving a Value >= the Pass Mark of the Item is marked as correct.

QtyIncorrect

Quantity Incorrect the count of how many Items are answered incorrectly in the scope of the statistic. For an individual Candidate for an individual Item, this value is either 0 or 1. For the scope of Candidate-Exam the value is 0 to number of Items on the Exam. This value is always a whole number. For Multiple Choice, responding to an Option that is not flagged correct marks the Candidate-Item as incorrect. For Essay, achieving a Value < the Pass Mark of the Item is marked as correct.

QtySkipped

Quantity Skipped the count of how many Items are not answered in the scope of the statistic. For an individual Candidate for an individual Item, this value is either 0 or 1. For the scope of Candidate-Exam the value is 0 to number of Items on the Exam. This value is always a whole number. For Multiple Choice, not selecting any Option is marked as skipped. For Essay, not responding to the Item is marked as skipped.

RawPercent

Raw Percent is the QtyCorrect / Numer of Items in the scope of the statistic. For an individual Candidate for an individual Item, this value is either 0 or 100. For the scope of Candidate-Exam the value is 0 to 100. This value is always between 0-100.

Value Statistics

ValueCorrect

Value Correct is the total value of all Items are answered correctly in the scope of the statistic. For an individual Candidate for an individual Item, this value is the value that Candidate achieved if they answered the Item correctly. For the scope of Candidate-Exam the value is the total points awarded over all Items on the Exam that are answered correctly. For Multiple Choice, responding to an Option that is flagged correct award the points of that Option. For Essay, the value is the points awarded up to the maximum value of that Item.

ValueIncorrect

Value Incorrect is the total value of all Items are answered incorrectly in the scope of the statistic. For an individual Candidate for an individual Item, this value is the value that Candidate achieved if they answered the Item incorrectly (typically 0, but some tests penalize incorrect responses). For the scope of Candidate-Exam the value is the total points awarded over all Items on the Exam that are answered incorrectly. For Multiple Choice, responding to an Option that is flagged incorrect award the points of that Option. For Essay, the value is 0.

ValueSkipped

Value Skipped is the total value of all Items are not responded to in the scope of the statistic. For an individual Candidate for an individual Item, this value is the value that Candidate achieved if they skipped the Item (typically 0, but some tests penalize skipped responses). For the scope of Candidate-Exam the value is the total points awarded over all skipped Items on the Exam. For Multiple Choice, skipping the Item awards the points set at the Item Skipped value. For Essay, the value is 0.

ValueTotal

Value Total is the sum of ValueCorrect, ValueIncorrect and ValueSkipped for the scope of the statistic.

ValueMax

Value Max is the Maximum achievable points for the scope of the statistic. For an Item it is the maximum value from its options, for Essay the maximum value set for the Item. For a Section it is the sum of all Items in the Section.

ValuePercent

Value Percent is the Value Total / Value Max in the scope of the statistic.

Rank and Decile

Ranking

Ranking is based on the sequence of the value in a list of all values for the scope of the statistic ordered descending. If the value is the 3rd highest value achieved it would have a Ranking of 3.

Decile

Decile is a grouping of 10 equal sets of values from all of the values for the scope of the statistic.

Standard Deviation and Variance

Note for Standard Deviation and Variance Calculations in CE we use StDev and VAR functions. The following discusses differences between the methods: STDEV is used when the group of numbers being evaluated are only a partial sampling of the whole population. The denominator for dividing the sum of squared deviations is N-1, where N is the number of observations (a count of items in the data set). Technically, subtracting the 1 is referred to as “non-biased.” STDEVP is used when the group of numbers being evaluated is complete - it's the entire population of values. In this case, the 1 is NOT subtracted and the denominator for dividing the sum of squared deviations is simply N itself, the number of observations (a count of items in the data set). Technically, this is referred to as “biased.” Remembering that the P in STDEVP stands for “population” may be helpful. Since the data set is not a mere sample, but constituted of ALL the actual values, this standard deviation function can return a more precise result.

Reliability

Reliability in statistics and psychometrics is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions.

We have included a Reliability spreadsheet that shows examples of how we calculate various reliability metrics as well as biserials.

Skewness and Kurtosis

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution.

Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers.

We have included a Skewness} and {{:Kurtosis.xlsx|Kurtosis} spreadsheet that shows examples of how we calculate Skewness and Kurtosis. ==== Adverse Impact ==== Steps for calculating Adverse Impact. - At each score, calculate the pass rate percentage for each subgroup. (divide the number of persons in each group (ethnicity subgroups and gender subgroups - that had that score by the total number of people in that subgroup) - Identify the highest percentage - Divide each of the other percentages by the higher percentage – if it is less than 80% than that group is adversely impacted. For example using the sample at the bottom of this document – in the following subset – there are 16 Caucasian, 92 Black, 111 Hispanic; (Total number of people in each subgroup acts as the denominator for step 1) If you were to pretend that those were the only three groups – the 1.1% would be the highest percentage rate. - If the passpoint were set at a score of 69.8, 0 Caucasian candidates out 16 Caucasian passed = 0%; 1/92 Black passed = 1.1%; 1/111 Hispanic passed = 0.9% passed. (pretending those are the only 3 groups for demonstrative purposes) - The highest is 1.1% for Blacks - Calculate the pass rates: * Caucasian 0/1.1 = 0% - less than 80% so there is Adverse Impact (thus the * next to it) * Hispanic 0.9/1.1 = 81.8% - more than 80% so there is no Adverse impact (therefore no * next to it) A second example: - For the score of 66.1 – 3/16 Caucasian passed = 18.75%; 6 out 92 Black passed = 6.52%; 5 out of 111 Hispanic passed = 4.50% - Highest percentage is 18.75% for Caucasian - Calculate the pass rates * Black = 6.52/18.75 = 34.8% - Less than 80%, so there is Adverse Impact * Hispanic = 4.5/18.75 = 24% - less than 80%, so there is Adverse Impact {{:adverseimpact.png?600|

Alternate Scores

Exam

For Each writing of an Exam by a Set of Candidates, we calculate an ExamMaxValue (ceStatsExam.ValueMax). This is the maximum points a candidate can achieve on the exam if they are awarded full points for all Items on the Exam.

The Exam also has the ExamMeanValue (ceStatsExam.ValueMean). This is the Mean of all CandidateValueTotal across the Exam.

The Exam also has the ExamStDevValue (ceStatsExam.ValueStDev). This is the Standard Deviation of all CandidateValueTotal across the Exam.

Candidate

For each Candidate taking the Exam, we calculate a CandidateValueTotal (ceStatsCandidate.ValueTotal). This is the sum of all points awarded to the Candidate across all items on the Exam.

We also Calculate a CandidateValuePercent (ceStatsCandidate.ValuePercent) which is the CandidateValueTotal/ExamMaxValue *100

Each Candidate also gets a zScore (ceStatsCandidate.zScore) calculated. This is calculated by:
(CandidateValueTotal – ExamMeanValue) / ExamStDevValue

Alternate Scores

To Calculate Alternate Scores (including t-Scores) CE uses an AlternateBase, AlternateStDev and AlternateMean.

CE Supports several Alternate Scoring Systems. Depending on the Alternate Scoring System selected, these values are set, or they can be manually selected. All values in the grid below can be overridden by the end user. The “True” calculation is no Alternate Scores. The AlternateBase is initially blank. If no value is provided, then the ExamValueMax is used.

We have the following systems loaded with values:

SystemAlternateBaseAlternateMeanAlternateStDev
True (Default)?--
Custom?0 or Calculated0 or Calculated
Z Scores?01
T Scores?5010
CECB?500100
NCE Scores?5021.6
Wechsler?103
Deviation?10015
Otis?10015

Candidates have 4 alternate values calculated.

ValueAlternate

This would be the t-Score
CandidateValueTotal * ExamMaxValue / AlternateBase

ValueNormalized

zScore * AlternateStDev + AlternateMean

ValueNormalizedPercent

This expressed the ValueNormalized as a percent.
(zScore * AlternateStDev + AlternateMean) / CandidateValueTotal * 100

ValueNormalizedAlternate

(zScore * AlternateStDev + AlternateMean) * AlternateBase / 100

Custom TScore Calculation

CE Can be setup to generate TScores via custom methods. On example of a custom method involves setting Alternate Mean and Alternate Standard Deviation values for all competencies on the Exam. The Compentecy TScores are then aggregated and an Exam Alternate Mean and Exam Alternate Standard Deviation are applied to create an exam TScore. See Alternate Competency Values.

Biserial Calculations

Item Writing currently calculates three Biserial values. Point Biserial (PBis), Point Biserial Corrected (PBisc) and Biserial Corrected (Bisc). These values are calculated by correlating Candidate performance on an item to Candidate performance over all Items in a specific group. So, Exam Biserials are calculated by correlating Candidate performance on an Item to Candidate performance on all Items on the Exam. Section Biserials are calculated by correlating Candidate performance on an Item to Candidate performance on all Items in the same Section of the Exam. Competency Biserials are calculated by correlating Candidate performance on an Item to Candidate performance on all Items linked to the same Competency on the Exam.

The calculations are explained below for Biserials for the Exam. For Section and Competency versions the scope of the aggregation changes from all Items to only those items in the Section or those items in the Competency respectively.

Point Biserial

m1Mean of correct on Exam for all Candidates that answered Item correctly.
m0Mean of correct on Exam for all Candidates that answered Item incorrectly.
n1Count of all Candidates that answered Item correctly.
n0Count of all Candidates that answered Item incorrectly.
sStandard deviation of Candidate correct on Exam.
nTotal number of Candidates on Exam.

Point Biserial Corrected

PBisItem Point Biserial
sdTestStandard Deviation of Candidate raw point totals on all Included Items on Exam.
varTestVariance of Candidate raw point totals on all Included Items on Exam.
sdItemStandard Deviation of all Candidate raw point totals on Item.
varItemVariance of all Candidate raw point totals on Item.

Biserial Corrected

pDifficulty. Quantity of correct candidate responses divide by the quantity of responses.
q1 - p
PBiscPoint Biserial Corrected.
meanTestMean (average) of Candidate raw point totals on all Included Items on Exam.
sdTestStandard Deviation of Candidate raw point totals on all Included Items on Exam.
NormInvReturns the inverse of the normal cumulative distribution for the specified mean and standard deviation.
NormInv(p, meanTest, sdTest)
d
ABS function returns the absolute value (i.e. the modulus) of any supplied number. The syntax of the function is: ABS(number). where the number argument is the numeric value that you want the modulus of.
Ordinal
EXP function in Excel calculates for the value of “e” raise to certain power of integer. “e” is a constant number which is equal to 2.71828182845904, the natural logarithm base. When it comes to the value of e, Excel uses a value of 2.718282.

Term Calculations

Term Calculations are run the through the Term Entry tool. They are used to aggregate results across multiple separate weighted Stat Results Set. Candidate results are then ranked by their overall performance across the weighted Stats Sets.

As an example, a Terms Calculation can be run on a 40% weighted midterm Stats Set along with a 60% weighted final stats set. This will produce a Term Stats set aggregating on the two source Stats sets.