INDEX
    Explanations

    discussion around preferences and societal norms

    New Auto-Interp
    Negative Logits
     sculptured
    -0.64
     noOf
    -0.64
     !!!
    -0.64
    maktadır
    -0.63
    omiast
    -0.61
    ="#"><
    -0.59
     می‌باشد
    -0.59
    又は
    -0.58
     již
    -0.58
     אשר
    -0.57
    POSITIVE LOGITS
     weirdly
    1.00
     vaguely
    0.97
     goddamn
    0.93
     whatnot
    0.91
     iirc
    0.89
     shitty
    0.89
     ostensibly
    0.85
     pretty
    0.83
     awkwardly
    0.82
     lemme
    0.82
    Act Density 1.833%

    No Known Activations