64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

A Neutral Zone Classifier for Three Classes with an Application to Text Mining

Author

DR
Daniel Jeske

Co-author

  • D
    Dylan Friel
  • Y
    Yunzhe Li
  • B
    Benjamin Ellis
  • H
    Herbie Lee
  • P
    Philip Kass

Conference

64th ISI World Statistics Congress - Ottawa, Canada

Format: IPS Abstract

Keywords: classification, text analysis

Session: IPS 421 - Data Science in Statistics: methodological and applied issues

Thursday 20 July 10 a.m. - noon (Canada/Eastern)

Abstract

A classifier may be limited by its conditional misclassification rates more than by the achievable overall misclassification rate. In the case that one or more of the misclassification rates are high a neutral zone may be introduced to lower, and possibly balance, conditional misclassification rates. In this talk we discuss a novel neutral zone for classifiers for three classes and examine some of its properties.

Our application is around the analysis of student evaluations of teaching. While the use of numerical Likert Scale response data has been fervently discussed in the literature, comparatively little attention has been given to how written comments could be used to provide a fuller story of student satisfaction.

We show how our neutral zone classifier can work in this setting by labeling the individual comments as reflecting a positive, mixed, or negative overall experience in the class, and then adopting a neutral zone for comments where the evidence for one of the three labels is not sufficiently strong. The proportions of comments within a given course that are positive, mixed, or negative can then be determined as a summary statistic for what could otherwise be a large corpus of text comments that might not be read. In addition, we analyze the distributions of comments that are positive/mixed/negative across gender and ethnicity groups to examine any potential differences that might parallel differences that have been found in the analyses of the numerical Likert Scale questions.