INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    تها
    0.64
     wynosi
    0.58
    ের
    0.57
     it
    0.55
    🦍
    0.54
    pont
    0.53
     lends
    0.52
    etop
    0.52
    ים
    0.52
     perfor
    0.52
    POSITIVE LOGITS
     unclear
    0.75
     Mudah
    0.73
    重要的是
    0.71
     difícil
    0.66
     Alors
    0.65
     Возможно
    0.65
     이러한
    0.64
     Можно
    0.64
     시간이
    0.64
    Οι
    0.64
    Act Density 0.241%

    No Known Activations