INDEX
    Explanations

    terms related to additional or supplementary aspects

    New Auto-Interp
    Negative Logits
     Viene
    -0.68
     unspeak
    -0.67
     Lmao
    -0.64
     affor
    -0.62
    Fuckin
    -0.60
     indescri
    -0.60
     Wtf
    -0.57
     imprimer
    -0.56
     Adorable
    -0.56
     Chapitre
    -0.55
    POSITIVE LOGITS
     Extra
    1.08
     extra
    1.08
     EXTRA
    1.08
    extra
    1.05
    Extra
    1.03
    EXTRA
    0.99
     ekstra
    0.90
     extras
    0.89
    extras
    0.83
    xtra
    0.77
    Act Density 0.063%

    No Known Activations