INDEX
    Explanations

    certainty and confidence expressed through affirmative language

    New Auto-Interp
    Negative Logits
    ester
    -0.17
    etine
    -0.16
     Bender
    -0.15
    lez
    -0.15
    èĸ
    -0.14
    yt
    -0.14
    agma
    -0.14
    Ñĩи
    -0.14
    ÅĻeb
    -0.14
    apan
    -0.14
    POSITIVE LOGITS
    ulas
    0.15
     Dream
    0.15
    addon
    0.14
    252
    0.14
    885
    0.14
    ijo
    0.14
    undo
    0.14
    ICAST
    0.14
    uchi
    0.14
    718
    0.13
    Act Density 0.245%

    No Known Activations