INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     gotta
    1.33
     couldn
    1.30
     bukan
    1.29
     Конечно
    1.24
     Nicht
    1.22
     stejně
    1.18
     ülke
    1.16
     değil
    1.15
     Não
    1.14
     ਜਾਂ
    1.14
    POSITIVE LOGITS
    l
    1.16
    u
    1.10
    j
    1.04
    ,
    1.03
    h
    1.03
    y
    0.94
    at
    0.91
    ки
    0.90
    w
    0.89
    opp
    0.85
    Act Density 0.090%

    No Known Activations