INDEX
    Explanations

    negative attributes and behaviors associated with characters, particularly cruelty and abrasiveness

    New Auto-Interp
    Negative Logits
    ovice
    -0.17
    eteria
    -0.17
    EMPL
    -0.16
     Bedford
    -0.15
    ekten
    -0.14
    laughter
    -0.14
    830
    -0.14
     khung
    -0.14
     Zwe
    -0.14
    owi
    -0.13
    POSITIVE LOGITS
     mean
    0.33
     abrasive
    0.33
     comb
    0.32
     ob
    0.30
     Mean
    0.29
    mean
    0.29
     rude
    0.28
     alo
    0.28
     confront
    0.27
     entitled
    0.27
    Act Density 0.478%

    No Known Activations