INDEX
    Explanations

    overly strong emotional language

    instances of the word "loathe" and its variations

    New Auto-Interp
    Negative Logits
    rity
    -0.70
    rition
    -0.68
     Buff
    -0.66
    ramid
    -0.65
    nesium
    -0.65
     Bravo
    -0.60
     Annotations
    -0.60
     chemistry
    -0.59
    LESS
    -0.59
    ITAL
    -0.58
    POSITIVE LOGITS
    oser
    1.00
    aves
    0.99
    omed
    0.93
    aning
    0.90
    lder
    0.88
    aunted
    0.83
    gged
    0.82
    ith
    0.82
    qq
    0.79
    aned
    0.79
    Act Density 0.023%

    No Known Activations