Statistics and Results Concepts
Stats Result Set
CE uses a Calculations process to generate Stats Result sets. A Stats Result set is all of the calculated values for an Exam answer key compared against a set or sets of Registrant responses registered to Exam Sessions. The Stats Result set contains Candidate performance against the Exam, Sections, Competencies as well as aggregated results for the Exam, Sections, Items and Competencies. The data in the Stats Result set are used by various reports and exports. The Stats Result set is stored in the database and can be over written by rerunning the Calculation process/
Statistics Scope
CE at its base level tracks each Candidates performance against each Item. This is the base scope of statistics CE calculates. CE then aggregates those Candidate-Item statistics to generate higher scoped statistics, like Candidate-Section statistics or Candidate-Exam statistics. CE also aggregates the Candidate statistics to create Exam scoped statistics such as Exam-Competency or Exam-Section statistics.
StatsID
For each calculation against an exam, a unique set of statistics is created and stored in the database. This unique set of statistics is keyed by the StatsID.
Include In Stats
Include in Stats is used to indicate whether a Candidate is auditing the Exam (false) or taking the Exam (true). If the Candidate is auditing the Exam, the Candidate has their statistics calculated, but their statistics are not aggregated up to Exam statistics. Include in Stats is also used to indicate whether an Item is experimental on the Exam (false) or Included on the Exam (true). If the Item is experimental on the Exam, the Item has its statistics calculated, but their statistics are not aggregated up to Exam statistics.
Statistics Types
CE Calculates two sets of statistics. 1) Raw statistics which are the counts of whether Candidates responded to an Item correctly, incorrectly or skipped the Item. 2) Value statistics which is the point totals allotted for answering the Item correctly, incorrectly or skipping the Item.
Raw Statistics
QtyCorrect
Quantity Correct is the count of how many Items are answered correctly in the scope of the statistic. For an individual Candidate for an individual Item, this value is either 0 or 1. For the scope of Candidate-Exam the value is 0 to number of Items on the Exam. This value is always a whole number. For Multiple Choice, responding to an Option that is flagged correct marks the Candidate-Item as correct. For Essay, achieving a Value >= the Pass Mark of the Item is marked as correct.
QtyIncorrect
Quantity Incorrect the count of how many Items are answered incorrectly in the scope of the statistic. For an individual Candidate for an individual Item, this value is either 0 or 1. For the scope of Candidate-Exam the value is 0 to number of Items on the Exam. This value is always a whole number. For Multiple Choice, responding to an Option that is not flagged correct marks the Candidate-Item as incorrect. For Essay, achieving a Value < the Pass Mark of the Item is marked as correct.
QtySkipped
Quantity Skipped the count of how many Items are not answered in the scope of the statistic. For an individual Candidate for an individual Item, this value is either 0 or 1. For the scope of Candidate-Exam the value is 0 to number of Items on the Exam. This value is always a whole number. For Multiple Choice, not selecting any Option is marked as skipped. For Essay, not responding to the Item is marked as skipped.
RawPercent
Raw Percent is the QtyCorrect / Numer of Items in the scope of the statistic. For an individual Candidate for an individual Item, this value is either 0 or 100. For the scope of Candidate-Exam the value is 0 to 100. This value is always between 0-100.
Value Statistics
ValueCorrect
Value Correct is the total value of all Items are answered correctly in the scope of the statistic. For an individual Candidate for an individual Item, this value is the value that Candidate achieved if they answered the Item correctly. For the scope of Candidate-Exam the value is the total points awarded over all Items on the Exam that are answered correctly. For Multiple Choice, responding to an Option that is flagged correct award the points of that Option. For Essay, the value is the points awarded up to the maximum value of that Item.
ValueIncorrect
Value Incorrect is the total value of all Items are answered incorrectly in the scope of the statistic. For an individual Candidate for an individual Item, this value is the value that Candidate achieved if they answered the Item incorrectly (typically 0, but some tests penalize incorrect responses). For the scope of Candidate-Exam the value is the total points awarded over all Items on the Exam that are answered incorrectly. For Multiple Choice, responding to an Option that is flagged incorrect award the points of that Option. For Essay, the value is 0.
ValueSkipped
Value Skipped is the total value of all Items are not responded to in the scope of the statistic. For an individual Candidate for an individual Item, this value is the value that Candidate achieved if they skipped the Item (typically 0, but some tests penalize skipped responses). For the scope of Candidate-Exam the value is the total points awarded over all skipped Items on the Exam. For Multiple Choice, skipping the Item awards the points set at the Item Skipped value. For Essay, the value is 0.
ValueTotal
Value Total is the sum of ValueCorrect, ValueIncorrect and ValueSkipped for the scope of the statistic.
ValueMax
Value Max is the Maximum achievable points for the scope of the statistic. For an Item it is the maximum value from its options, for Essay the maximum value set for the Item. For a Section it is the sum of all Items in the Section.
ValuePercent
Value Percent is the Value Total / Value Max in the scope of the statistic.
Rank and Decile
Ranking
Ranking is based on the sequence of the value in a list of all values for the scope of the statistic ordered descending. If the value is the 3rd highest value achieved it would have a Ranking of 3.
Decile
Decile is a grouping of 10 equal sets of values from all of the values for the scope of the statistic.
Standard Deviation and Variance
Note for Standard Deviation and Variance Calculations in CE we use StDev and VAR functions. The following discusses differences between the methods: STDEV is used when the group of numbers being evaluated are only a partial sampling of the whole population. The denominator for dividing the sum of squared deviations is N-1, where N is the number of observations (a count of items in the data set). Technically, subtracting the 1 is referred to as “non-biased.” STDEVP is used when the group of numbers being evaluated is complete - it's the entire population of values. In this case, the 1 is NOT subtracted and the denominator for dividing the sum of squared deviations is simply N itself, the number of observations (a count of items in the data set). Technically, this is referred to as “biased.” Remembering that the P in STDEVP stands for “population” may be helpful. Since the data set is not a mere sample, but constituted of ALL the actual values, this standard deviation function can return a more precise result.
Reliability
Reliability in statistics and psychometrics is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions.
We have included a Reliability spreadsheet that shows examples of how we calculate various reliability metrics as well as biserials.
Skewness and Kurtosis
Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution.
Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers.
Alternate Scores
Exam
For Each writing of an Exam by a Set of Candidates, we calculate an ExamMaxValue (ceStatsExam.ValueMax). This is the maximum points a candidate can achieve on the exam if they are awarded full points for all Items on the Exam.
The Exam also has the ExamMeanValue (ceStatsExam.ValueMean). This is the Mean of all CandidateValueTotal across the Exam.
The Exam also has the ExamStDevValue (ceStatsExam.ValueStDev). This is the Standard Deviation of all CandidateValueTotal across the Exam.
Candidate
For each Candidate taking the Exam, we calculate a CandidateValueTotal (ceStatsCandidate.ValueTotal). This is the sum of all points awarded to the Candidate across all items on the Exam.
We also Calculate a CandidateValuePercent (ceStatsCandidate.ValuePercent) which is the CandidateValueTotal/ExamMaxValue *100
Each Candidate also gets a zScore (ceStatsCandidate.zScore) calculated. This is calculated by:
(CandidateValueTotal – ExamMeanValue) / ExamStDevValue
Alternate Scores
To Calculate Alternate Scores (including t-Scores) CE uses an AlternateBase, AlternateStDev and AlternateMean.
CE Supports several Alternate Scoring Systems. Depending on the Alternate Scoring System selected, these values are set, or they can be manually selected. All values in the grid below can be overridden by the end user. The “True” calculation is no Alternate Scores. The AlternateBase is initially blank. If no value is provided, then the ExamValueMax is used.
We have the following systems loaded with values:
System | AlternateBase | AlternateMean | AlternateStDev |
---|---|---|---|
True (Default) | ? | - | - |
Custom | ? | 0 or Calculated | 0 or Calculated |
Z Scores | ? | 0 | 1 |
T Scores | ? | 50 | 10 |
CECB | ? | 500 | 100 |
NCE Scores | ? | 50 | 21.6 |
Wechsler | ? | 10 | 3 |
Deviation | ? | 100 | 15 |
Otis | ? | 100 | 15 |
Candidates have 4 alternate values calculated.
ValueAlternate
This would be the t-Score
CandidateValueTotal * ExamMaxValue / AlternateBase
ValueNormalized
zScore * AlternateStDev + AlternateMean
ValueNormalizedPercent
This expressed the ValueNormalized as a percent.
(zScore * AlternateStDev + AlternateMean) / CandidateValueTotal * 100
ValueNormalizedAlternate
(zScore * AlternateStDev + AlternateMean) * AlternateBase / 100
Custom TScore Calculation
CE Can be setup to generate TScores via custom methods. On example of a custom method involves setting Alternate Mean and Alternate Standard Deviation values for all competencies on the Exam. The Compentecy TScores are then aggregated and an Exam Alternate Mean and Exam Alternate Standard Deviation are applied to create an exam TScore. See Alternate Competency Values.
Biserial Calculations
Item Writing currently calculates three Biserial values. Point Biserial (PBis), Point Biserial Corrected (PBisc) and Biserial Corrected (Bisc). These values are calculated by correlating Candidate performance on an item to Candidate performance over all Items in a specific group. So, Exam Biserials are calculated by correlating Candidate performance on an Item to Candidate performance on all Items on the Exam. Section Biserials are calculated by correlating Candidate performance on an Item to Candidate performance on all Items in the same Section of the Exam. Competency Biserials are calculated by correlating Candidate performance on an Item to Candidate performance on all Items linked to the same Competency on the Exam.
The calculations are explained below for Biserials for the Exam. For Section and Competency versions the scope of the aggregation changes from all Items to only those items in the Section or those items in the Competency respectively.
Point Biserial
m1 | Mean of correct on Exam for all Candidates that answered Item correctly. |
m0 | Mean of correct on Exam for all Candidates that answered Item incorrectly. |
n1 | Count of all Candidates that answered Item correctly. |
n0 | Count of all Candidates that answered Item incorrectly. |
s | Standard deviation of Candidate correct on Exam. |
n | Total number of Candidates on Exam. |
Point Biserial Corrected
PBis | Item Point Biserial |
sdTest | Standard Deviation of Candidate raw point totals on all Included Items on Exam. |
varTest | Variance of Candidate raw point totals on all Included Items on Exam. |
sdItem | Standard Deviation of all Candidate raw point totals on Item. |
varItem | Variance of all Candidate raw point totals on Item. |
Biserial Corrected
Term Calculations
Term Calculations are run the through the Term Entry tool. They are used to aggregate results across multiple separate weighted Stat Results Set. Candidate results are then ranked by their overall performance across the weighted Stats Sets.
As an example, a Terms Calculation can be run on a 40% weighted midterm Stats Set along with a 60% weighted final stats set. This will produce a Term Stats set aggregating on the two source Stats sets.