INDEX
    Explanations

    references to mental health issues and social injustices

    New Auto-Interp
    Negative Logits
     semiclass
    -0.16
    Ìģc
    -0.14
    ¶Į
    -0.14
    jee
    -0.13
    oute
    -0.13
     Cumhur
    -0.13
    emachine
    -0.13
    èĪĪ
    -0.13
    atures
    -0.13
     nonatomic
    -0.13
    POSITIVE LOGITS
     actual
    0.55
    actual
    0.47
     Actual
    0.42
    Actual
    0.42
     actually
    0.42
    羣æŃ£
    0.40
     real
    0.38
    (actual
    0.35
    _actual
    0.35
     true
    0.35
    Act Density 0.250%

    No Known Activations