INDEX
    Explanations

    phrases related to information accuracy and ethical considerations

    New Auto-Interp
    Negative Logits
     malheure
    -0.99
     unwarran
    -0.96
     Wtf
    -0.94
     shenan
    -0.88
    Ikr
    -0.86
    Yess
    -0.84
    Noice
    -0.84
     disagre
    -0.84
     Lmao
    -0.83
     effray
    -0.83
    POSITIVE LOGITS
    <bos>
    1.05
     interesting
    0.73
     story
    0.64
    interesting
    0.63
    sworth
    0.61
     Himo
    0.58
     interes
    0.58
     fascinating
    0.57
     juicy
    0.56
     ContentValues
    0.56
    Act Density 0.555%

    No Known Activations