INDEX
    Explanations

    strong, negative actions or criticisms

    instances of verbal aggression or confrontation in text

    New Auto-Interp
    Negative Logits
     Notting
    -0.74
     orderly
    -0.70
    Transfer
    -0.68
     Genius
    -0.67
     Transform
    -0.65
    stable
    -0.62
    sterdam
    -0.59
     Alive
    -0.59
     safest
    -0.58
    kj
    -0.58
    POSITIVE LOGITS
     accusing
    1.00
     against
    0.91
     leveled
    0.84
     jab
    0.81
     accuses
    0.81
     criticizing
    0.81
     critiques
    0.80
     critics
    0.79
     insults
    0.79
    against
    0.78
    Act Density 0.203%

    No Known Activations