INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ین
    2.06
    ется
    1.77
    こと
    1.77
    Сто
    1.75
    ینگ
    1.64
    Зна
    1.59
    З
    1.59
    Те
    1.57
    Ста
    1.55
    Та
    1.55
    POSITIVE LOGITS
    speople
    1.80
    1.77
    sigh
    1.71
    e
    1.66
    MAY
    1.55
     אף
    1.49
    tails
    1.47
    वर्ती
    1.45
    gherita
    1.45
    GOT
    1.44
    Act Density 0.080%

    No Known Activations