INDEX
    Explanations

    derogatory and offensive language

    derogatory terms and insults

    New Auto-Interp
    Negative Logits
    qua
    -0.71
    tnc
    -0.71
     unification
    -0.67
     Horizons
    -0.67
     conduc
    -0.67
     transformative
    -0.65
     bilateral
    -0.65
     tranqu
    -0.65
     Passage
    -0.64
     stabilization
    -0.63
    POSITIVE LOGITS
     bastard
    1.06
    fuck
    1.05
     liar
    1.01
     bitch
    1.00
     Bastard
    1.00
     cunt
    0.99
     asses
    0.98
     asshole
    0.98
     idiot
    0.97
    hole
    0.96
    Act Density 0.137%

    No Known Activations