INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bahawa
    1.01
     형태
    1.00
    ोत्तम
    0.94
     parthen
    0.93
    <unused21>
    0.87
    یف
    0.87
    larım
    0.86
     eğer
    0.86
    0.86
     pokud
    0.85
    POSITIVE LOGITS
    т
    1.35
    ف
    1.31
    ки
    1.30
    いた
    1.29
    ून
    1.24
    1.24
    कर
    1.22
    1.15
    1.14
    ج
    1.13
    Act Density 0.035%

    No Known Activations