INDEX
    Explanations

    references to actions perceived as negative or harmful to others

    expressions of disbelief or shock regarding human actions or events

    New Auto-Interp
    Negative Logits
     Occasionally
    -0.64
     periodically
    -0.58
     shortly
    -0.55
     endif
    -0.54
    Newsletter
    -0.54
     assures
    -0.54
     explains
    -0.53
     respectively
    -0.52
     incumb
    -0.51
     summarizes
    -0.51
    POSITIVE LOGITS
     such
    1.30
    such
    1.12
    Such
    0.97
     anything
    0.96
     this
    0.93
     Such
    0.93
    this
    0.92
     THIS
    0.92
    these
    0.87
     THAT
    0.85
    Act Density 0.564%

    No Known Activations