INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     came
    -0.07
     spoke
    -0.07
    OMETRY
    -0.07
     went
    -0.07
     walked
    -0.07
     remained
    -0.07
    -0.07
     showed
    -0.07
    했습니다
    -0.06
    ,status
    -0.06
    POSITIVE LOGITS
    loff
    0.07
    950
    0.07
     fireworks
    0.07
     Зам
    0.07
     Geme
    0.06
    àm
    0.06
    خم
    0.06
    Merc
    0.06
    fin
    0.06
    327
    0.06
    Act Density 0.508%

    No Known Activations