CRAME Research Reports and Conference Papers


Automatic item generation and artificial intelligence.
Hollis Lai & Mark J. Gierl (Maryland Conference 2017)

Generating rationales to support formative feedback in adaptive testing for computer adaptive testing.
Mark J. Gierl & Okan Bulut (IACAT 2017)

Rationale generation: An expansion of the item generation framework.
Mark J. Gierl & Xinxin Zhang (NCME 2017)

The Achilles' heel of multiple-choice items: Distractors.
Okan Bulut, Mark J. Gierl, Qi Guo & Xinxin Zhang (NCME 2017)

Evaluating text similarity of generated items using graph theory.
Xinxin Zhang & Mark J. Gierl (NCME 2017)

Extreme scoring machine: Integrating deep language features for developing an essay scoring framework.
Syed Latifi & Mark J. Gierl (NCME 2017)

Implementing automated item generation in a large-scale medical licensing examination Program: Lessons learned.
André De Champlain & Mark J. Gierl (ATP 2017)


Examining position effects in large-scale assessments using an SEM approach
Okan Bulut, Qi Guo, & Mark J. Gierl (ITC 2016)

Criterion-related validity of subscores in high school diploma examinations
Okan Bulut (ITC 2016)

Examining item and testlet position effects in computer-based alternate assessments
Okan Bulut, Xiaodong Hou, & Ming Lei (NCME 2016)

Understanding nonresponse behaviors of students with disabilities in alternate assessments
Okan Bulut, Ming Lei, Mehmet Kaplan, & Damien Cormier (AERA 2016)

Enhancements of simulated science laboratory assessments
Man-Wai Chu & Jacqueline P. Leighton (ITC 2016)

A panel structural equation model of the effects of trust and sympathy on learner outcomes
Jacqueline P. Leighton, Qi Guo, Man-Wai Chu & Wei Tang (AERA 2016)

Comparing the reliability coefficients from five approaches to reliability estimation: A Monte Carlo study
Wei Tang, Ying Cui & Jacqueline P. Leighton (AERA 2016)

A validation study of the Learning Errors and Formative Feedback (LEAFF) model
Wei Tang, Qi Guo & Jacqueline P. Leighton (NCME 2016)

Automated evaluation of psycho-educational assessment reports in school psychology using deep features of writing
Fahad Latifi, Mark Gierl & Damien Cormier (CSSE, 2016)

A method to validate items produced using automatic item generation
Mark Gierl & Hollis Lai (NCME, 2016)

Modeling the global text features for enhancing the automated scoring system
Fahad Latifi & Mark Gierl (NCME, 2016)

Recovering the item model structure from automatically generated items using graph theory
Xinxin Zhang & Mark Gierl (NCME, 2016)

Using technology-enhanced processes to generate items in multiple languages
Hollis Lai & Mark Gierl (NCME, 2016)

Modeling approaches for automatic item generation in dental education
Hollis Lai, Mark Gierl, Andrew Spielman, Ellen Byrne & David Waldschmidt (ADEA, 2016)


A Novel Approach for Quantify Semantics of Automatically Generated Items
Syed F. Latifi, Mark J. Gierl, Ren Wang & Andong Wang (NCME 2015)

Developing and Validating the Attitudes Towards Mistakes Inventory (ATMI): A Self-Report Measure
Jacqueline P. Leighton, Wei Tang, Qi Guo (NCME 2015)

Accounting for Affective States in Response Processing Data: Impact for Validation
Jacqueline P. Leighton (NCME 2015)

A Method for Multilingual Automatic Item Generation
Mark J. Gierl, Hollis Lai, Lorena Houston, Changhua Rich & Keith Boughton (ATP 2015)

Evaluating the Psychometric Properties of Generated Test Items
Mark J. Gierl, Hollis Lai, André-Philippe Boulais, André De Champlain, Claire Touchie & Debra Pugh (ATP 2015)

Using Automatic Item Generation to Develop Practice Non-Verbal Reasoning Items for a High-Stakes Admissions Test
Marita Ball & Mark J. Gierl (ATP 2015)


Student and School factors Associated with Aberrant Response Patterns on a Large Scale Assessment
Amin Mousavi & Ying Cui (AERA 2014)

Using hierarchy linear modeling to examine factors predicting students' reading achievement.
Karen Fung & Samira ElAtia (CSSE 2014)


Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT
Amin Mousavi (NCME 2013)

Evaluation the translations on item models in automatic item generation.
Karen Fung & Mark J. Gierl (NCME 2013)

Using linked elements for creating item models of multiple languages.
Karen Fung & Mark J. Gierl (CSSE 2013)

Evaluate the performance if Iz and I*z of person fit: A simulation study.
Amin Mousavi & Ying Cui (NCME 2013)

Internal Consistency: Do We Really Know What It Is and How To Assess It?
Wei Tang & Ying Cui (AERA 2013)

Towards Automated Scoring using Open-Source Technologies
Sayed M. Fahad Latifi, Qi Guo, Mark J. Gierl, Amin Mousavi, Karen Fung (CSSE 2013)

Establishing Item Uniqueness for Automatic Item General.
Sayed M. Fahad Latifi, Mark J. Gierl, Hollis Lai & Karen Fung (NCME 2013)

Defeating the Automated Scoring: Is it Possible to Cheat in Automatic Essay Scoring?
Syed M. Fahad Latifi, Karen Fung, Mark J. Gierl, Amin Mousavi & Qi Guo (AERA 2013)

Using Automated Processes to Generate Test Items in Multiple Languages
Dr. Mark J. Gierl, Karen Fung, Dr. Hollis Lai, Dr. Bin Zheng (NCME 2013)


Gestalt Principles in Physics Education: Does it come with Teaching Experience?
Man-Wai Chu (CSSE 2012)

Testing Expert-Based vs. Student-Based Cognitive Models for a Grade 3 Diagnostic Mathematics Assessment
Mary Roduta Roberts (AERA 2012)

Issues of Cost, Time and Validity: Psychometric Perspectives on Technologically-Rich Innovative Assessments (TRIAs)
Jacquelinen P. Leighton (AERA 2012)

Bootstrap Confidence Intervals for the Range-Restricted Coefficient Alpha
Johnson Ching-hong Li, Ying Cui, Mark J. Gierl & Wai Chan (AERA 2012)

Examining Language Proficiency, Test Performance, and Test Fairness using Data from the Pan-Canadian Assessment Program
Karen Fung, Samilar ElAtia, & Mark J. Gierl (AERA 2012)

A Simulation Study for Comparing Three Lower Bounds to Reliability
Wei Tang & Ying Cui (AERA 2012)

Detecting Directional DIF using CATSIB with Impact Present
Man-Wai Chu, Hollis Lai, Xian Wang (NCME 2012)

Estimating Classification Consistency and Accuracy for Cognitive Diagnostic Assessment
Ying Cui (NCME 2012)

Design Principles Required for Skills-Based Calibrated Item Generation
Hollis Lai & Mark J. Gierl (NCME 2012)

Item Consistency Index: An Item-Fit Index for Cognitive Diagnostic Assessment
Hollis Lai, Mark J. Gierl & Ying Cui (NCME 2012)

Methods for Creating and Evaluating the Item Model Structure Used In Automatic Item Generation
Mark J. Gierl, Hollis Lai & Krista Breithaupt (NCME 2012)


Developing and Evaluating Score Reports for a Diagnostic Mathematics Assessment
Mary Roduta Roberts & Mark Gierl (AERA 2011)

Does Culture have an Effect on Cognitive Patterns? Examination of Cultural Effect on Categorization.
Alex Riedel & Qi Guo (AERA 2011)

The Role of Item Models in Automatic Item Generation
Mark Gierl & Hollis Lai (NCME 2011)

A Comparison of Logistic Regression, CSIBTEST, and Combined Decision Rule for Detection of Uniform and Nonuniform DIF Items using Real Data
Qi Guo & Alex Riedel (NCME 2011)


Evaluating Statistical Reasoning of College Students in the Social and Health Sciences with Cognitive Diagnostic Assessment
Ying Cui, Mary Roduta Roberts, Andrea Gotzmann (AERA 2010)

Do cognitive models consistently show good model-data-fit for students at different ability levels?
Andrea Gotzmann, Mary Roduta Roberts (AERA 2010)

Using Automated Item Generation to Promote Principled Test Design and Development
Cecilia B. Alves, Mark J. Gierl, & Hollis Lai (AERA 2010)

Using Principled Test Design to Develop and Evaluate a Diagnostic Mathematics Assessment in Grades 3 and 6
Mark J. Gierl, Cecilia Alves, & Renate Taylor Maueau (AERA 2010)

2009 Two Types of Think Aloud Interview for Educational Measurement: Protocol and Verbal Analysis
Jacqueline P. Leighton (NCME 2009)
  Using Cognitive Models to Evaluate Ethnicity and Gender Differences
Andrea Gotzmann, Mary Roduta Roberts, Cecilia Brito Alves, & Mark J. Gierl (AERA 2009)
  Development of a Framework for Diagnostic Score Reporting
Mary Roduta Roberts & Mark J. Gierl (AERA 2009)
  Estimating the Attribute Hierarchy Method with Mathematica
Ying Cui, Mark Gierl, & Jacqueline Leighton
  A Comparison of Three Weighting Procedures for High- and Low-Stakes Examinations with Mixed Item Formats in Different Subject Areas       
W. Todd Rogers & Denise M. Nowicki (NCME 2009)
Three Applications of Automated Test Assembly within a User-Friendly Modeling Environment
Ken Cor, Cecilia Alves & Mark J. Gierl (NCME, 2009)
Attribute Reliability in Cognitive Diagnostic Assessment
Jiawen Zhou, Mark J. Gierl & Ying Cui (NCME, 2009)
Development of Cognitive Models in Mathematics to Promote Diagnostic Inferences about Student Performance
Mary Roduta Roberts, Cecilia Brito Alves, Andrea Gotzmann & Mark J. Gierl (AERA, 2009)
Using Judgments from Content Specialists to Develop Cognitive Models for Diagnostic Assessments
Mark J. Gierl, Mary Roberts, Cecilia Alves & Andrea Gotzmann (NCME, 2009)



An Experimental Test of Student Verbal Reports and Expert Teacher Evaluation as a Source of Validity Evidence for Test Development
Jacqueline P. Leighton, Colleen Heffernan, M. Kenneth Cor, Rebecca J. Gokiert & Ying Cui (AERA, 2008)
The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment
Ying Cui & Jacqueline P. Leighton (AERA, 2008)
Using Cochran's Z Statistic to Test the Kernel-Smoothed IRF Differences between Focal and Reference Group
Yinggan Zheng & Mark J. Gierl (AERA, 2008)
Computerized Adaptive-Attribute Testing: Incorporating Psychological Principles with Assessment Practices in Computerized Adaptive Testing
Jiawen Zhou, Mark J. Gierl & Ying Cui (NCME, 2008)
The Role of Academic Confidence and Epistemological Beliefs in Syllogistic Reasoning Performance
Carol M. Okamoto, Jacqueline P. Leighton & M. Kenneth Cor (AERA, 2008)
Testing Expert-Based and Student-Based Cognitive Models: An Application of the Attribute Hierarchy Method and Hierarchy Consistency Index
Jacqueline P. Leighton, Ying Cui & M. Kenneth Cor (NCME, 2008)

Cognitive-Psychometric Modeling of the MELAB Reading Items
Lingyun Gao & Todd Rogers (NCME, 2007)

Purposes of an Issues with the Provincial Testing Programs in Alberta
W. Todd Rogers & Donald A. Klinger (NCME, 2007)


Using Connectionist Models to Evaluate Examinees' Response Patterns on Tests: An Application of the Attribute Hierarchy Method to Assessment Engineering
Mark J. Gierl, Ying Cui & Steve Hunka (NCME, 2007)


Using Real Data to Compare DIF Detection and Effect Size Measures among Mantel-Haenszel, SIBTEST, and Logistic Regression Procedures
Yinggan Zheng, Mark J. Gierl & Ying Cui (NCME, 2007)


Investigating the Cognitive Attributes Underlying Student Performance on the SAT Critical Reading Subtest: An Application of the Attribute Hierarchy Method
Changjiang Wang & Mark J. Gierl (NCME, 2007)


Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees' Cognitive Skills
Mark J. Gierl (ATP, 2007)

2006 The Hierarchy Consistency Index: A Person-fit Statistic for the Attribute Hierarchy Method
Ying Cui, Jacqueline P. Leighton, Mark J. Gierl, & Steve M. Hunka (NCME, 2006)
Simulation Studies for Evaluating the Performance of the Two Classification Methods in the AHM
Ying Cui, Jacqueline P. Leighton, & Yinggan Zheng (NCME, 2006)
Evaluating DETECT Classification Accuracy and Consistency when Data Display Complex Structure
Mark J. Gierl, Jacqueline P. Leighton, & Xuan Tan (NCME, 2006)
A Three-Stage Approach for Identifying Gender Differences on Large-Scale Science Assessments
Rebecca J. Gokiert & Jacqueline P. Leighton (NCME, 2006)
Validity of the Simultaneous Approach to the Development of Equivalent Achievement tests in English and French (Stage III)
Jie Lin & W. Todd Rogers (NCME, 2006)
Investigating the Cognitive Attributes Underlying Student Performance on a Foreign Language Reading Test: An Application of the Attribute Hierarchy Method
Changjiang Wang, Mark J. Gierl, & Jacqueline P. Leighton (NCME, 2006)
Evaluating the Performance of SIBTEST and MULTISIB Using Different Matching Criteria
Jiawen Zhou, Mark J. Gierl, & Xuan Tan (NCME, 2006)
Evaluating the Consistency of DETECT Indices and Item Clusters Using Simulated and Real Data that Display both Simple and Complex Structure
Xuan Tan & Mark J. Gierl (AERA, 2006)

2005 Evaluating DETECT Classification Accuracy and Consistencywhen Data Display Complex Structure
Mark J. Gierl, Jacqueline P. Leighton, & Xuan Tan (CSSE, 2005)
Investigating Test Items Designed to Measure Higher-Order Reasoning using Think-Aloud Methods: Implications for Construct Validity and Alignment
Jacqueline P. Leighton & Rebecca J. Gokiert (AERA, 2005)
Investigating the Statistical and Cognitive Dimensions in Large-Scale Science Assessments: Causal and Categorical Reasoning in Science
Jacqueline P. Leighton, Rebecca J. Gokiert*, & Ying Cui (AERA, 2005)
The Cognitive Effects of Test Item Features: Informing Item Generation by Identifying Construct Irrelevant Variance
Jacqueline P. Leighton & Rebecca J. Gokiert (NCME, 2005)
Identifying cognition dimensions that affect student performance on the new SAT
Mark J. Gierl, Xuan Tan, & Changjiang Wang (NCME, 2005)
Validity of the Simultaneous Approach to the Development of Equivalent Achievement Tests in English and French (Stage II)
Jie Lin & W. Todd Rogers (NCME, 2005)
Using Five Procedures to Detect DIF with Passage-Based Testlets
Lingyun Gao & Changjiang Wang (NCME, 2005)
Using Global and Local DIF Analyses to Assess DIF across Language Groups
Xuan Tan & Mark J. Gierl (NCME, 2005)

2004 Using a Multidimensionality-Based Framework to Identify and Interpret the Construct-Related Dimensions that Elicit Group Differences
Mark J. Gierl (AERA, 2004)
  Gender Differential Item Functioning on the WISC-II: Analysis of the Canadian Standardization Sample
Rebecca J. Gokiert & Kathryn L. Ricker (AERA, 2004)


Robustness of Lord's Formulas for Item Difficulty and Discrimination Conversions between Classical and Item Response Theory Models
Tess Dawber (AERA, 2004)
  Standard Setting Using the Attribute Hierarchy Model
Gregory S. Sadesky (NCME, 2004)
  The Identification and Interpretation of Group Differences on the Canadian Language Benchmarks Assessment Reading Items
Marilyn Abbott (NCME, 2004)
  Using the Multidimensionality-Based DIF Analysis Paradigm to Study Cognitive Skills that Elicit Group Differences: A Critique
Mark J. Gierl (NCME, 2004)

2003 Setting Cut Scores: Critical Review of Angoff and Modified-Angoff Methods
Kathryn L. Ricker (CSSE, 2003)
  Standard Setting For Complex Performance Assessments: A Critical Examination of the Analytic Judgment Method
Marilyn Abbott (CSSE, 2003)
  Cluster Analysis and its Application In Standard Setting
Gregory S. Sadesky (CSSE, 2003)
  Standard-setting Issues in Computerized-Adaptive Testing
Matthew M. Gushta (CSSE, 2003)
  The Bookmark Standard Setting Procedure: Strengths and Weaknesses
Jie Lin (CSSE, 2003)
  Promoting Gender Equity in Alberta's Provincial Social Studies 30 Diploma Examinations
Marilyn Abbott (NCME, 2003)
  Differential Validity and Utility of Successive and Simultaneous Approaches to the Development of Equivalent Achievement Tests in French and English
W. Todd Rogers, Mark J. Gierl, Claudette Tardif, & Jie Lin (NCME, 2003)
  Implications of the Multidimensionality-Based DIF Analysis Framework for Selecting a Matching and Studied Subtest
Mark J. Gierl & Daniel M. Bolt (NCME, 2003)
  Evaluating the Comparability of English- and French-Speaking Examinees on a Science Achievement Test Administered using Two-Stage Testing
Gautam Puhan & Mark J. Gierl (NCME, 2003)
  Differential Performance by Gender in Foreign Language Testing
Jie Lin & Fenglan Wu (NCME, 2003)

2002 Identifying Content and Cognitive Skills that Produce Gender Differences in Mathematics: A Demonstration of the DIF Analysis Framework
Mark J. Gierl, Jeffrey Bisanz, Gay L. Bisanz, & Keith A. Boughton (NCME, 2002)


The Attribute Hierarchy Model for Cognitive Assessment
Jacqueline P. Leighton, Mark J. Gierl, & Stephen M. Hunka (NCME, 2002)


The Cognitive Experience of Bookmark Standard Setting Participants
Tess Dawber & Daniel M. Lewis (AERA, 2002)


Cognition or Motivation: What leads to Performance Differences in Science
Gautam Puhan & Huiqin Hu (NCME, 2002)

2001 Illustrating the Utility of Differential Bundle Functioning Analyses to Identify and Interpret Group Differences on Achievement Tests
Mark J. Gierl, Jeffrey Bisanz, Gay L. Bisanz, & Keith A. Boughton (AERA, 2001)


Effects of Randon Rater Error on Parameter Recovery of the Generalized Partial Credit Model and Graded Response Model
Keith A. Boughton, Don A. Klinger, & Mark J. Gierl (NCME, 2001)


Differential Bundle Functioning on Three Achievement Tests: A Comparison of Aboriginal and Non-Aboriginal Examinees
Christine N. Vandenberghe & Mark J. Gierl (AERA, 2001)


Construction of Automated Parallel Forms and Multiple Parallel Panels in Computer-Adaptive Sequential Testing: New Measures of Parallelism and Their Applications
Keith A. Boughton, Fernando L. Cartwright, & Mark J. Gierl (AERA, 2001)


Differential Bundle Functioning on Social Studies High School Certification Exams
Keith A. Boughton, Tess E. Dawber, & Laurie-Ann M. Hellsten (AERA, 2001)

2000 Identifying Sources of Differential Item Functioning on Translated Tests: A Confirmatory Approach
Mark Gierl & Shameem Nyla Khaliq (NCME, 2000)


Reducing Type I Error Using an Effect Size Measure with the Logistic Regression Procedure for DIF detection
Michael Jodoin & Mark Gierl (NCME, 2000)


Comparison of Ability Estimates from Dichotomously and Nominally-Scored Testwise Susceptible and Non-susceptible Items
Joanna Tomkowicz & W. Todd Rogers (AERA, 2000)


Performance of Mantel-Haenszel, SIBTEST, and Logistic Regression when the Number of DIF items is Large
Mark Gierl, Michael Jodoin, & Terry Ackerman (AERA, 2000)


Automated Test Assembly Procedures for Criterion-Referenced Testing Using Optimization Heuristics
Keith A. Boughton & Mark J. Gierl (AERA, 2000)


Differential Bundle Functioning on Mathematics and Science Achievement Tests: A Small Step Toward Understanding Differential Performance
Keith A. Boughton, Mark J. Gierl, & Shameem Nyla Khaliq (CSSE, 2000)

1999 Assessing the Computational Accuracy in Statistical Packages
Steve Hunka (August, 1999)


Using Statistical and Jugmental Reviews to Identify and Interpret Translation DIF
Mark J. Gierl, W. Todd Rogers, & Don Klinger (NCME, 1999)


Gender Differential Item Functioning in Mathematics and Science: Prevalence and Policy Implications
Mark Gierl, Shameem Nyla Khaliq, & Keith A. Boughton (CSSE, 1999)

1998 Teacher Evaluation
Robert Stake (November, 1998)

Principles for Fair Student Assessment Practices for Education in Canada

Downloadable in English or French