INDEX
    Explanations

    phrases indicating certainty or strong opinions

    New Auto-Interp
    Negative Logits
     NOTHING
    -0.20
     NONE
    -0.19
    NONE
    -0.17
    ENTE
    -0.17
     anything
    -0.17
    anything
    -0.16
    nothing
    -0.15
    none
    -0.15
    олов
    -0.15
    icont
    -0.15
    POSITIVE LOGITS
     cach
    0.17
     absolutely
    0.17
    elas
    0.15
    imos
    0.14
     bot
    0.14
    647
    0.14
    ergus
    0.14
    eler
    0.14
     bott
    0.13
    452
    0.13
    Act Density 0.190%

    No Known Activations