INDEX
    Explanations

    proper nouns or names of individuals

    mentions of significant events, arrests, and consequences in societal contexts

    New Auto-Interp
    Negative Logits
    !.
    -0.69
    inis
    -0.65
    }.
    -0.64
    +.
    -0.60
    };
    -0.59
    cellaneous
    -0.57
    utterstock
    -0.57
    ''.
    -0.56
    .$
    -0.56
    .''
    -0.55
    POSITIVE LOGITS
     lacks
    0.70
     hadn
    0.70
     lacked
    0.69
     should
    0.69
     shouldn
    0.66
     cannot
    0.65
     had
    0.64
     hasn
    0.60
     behaved
    0.60
     exists
    0.58
    Act Density 0.993%

    No Known Activations