INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     neither
    -0.16
     não
    -0.13
     tidak
    -0.12
    à¹Ħม
    -0.12
     nicht
    -0.12
     không
    -0.12
     nemus
    -0.11
     cannot
    -0.11
     didn
    -0.11
     not
    -0.11
    POSITIVE LOGITS
     ever
    0.26
     вообÑīе
    0.20
     indeed
    0.19
     vůbec
    0.19
     EVER
    0.17
     truly
    0.17
     actually
    0.17
     really
    0.17
    æľ¬å½ĵãģ«
    0.16
     дейÑģÑĤвиÑĤелÑĮно
    0.15
    Act Density 0.125%

    No Known Activations