INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ль
    1.32
    ра
    1.23
    را
    1.16
    .
    1.13
    لی
    1.09
    ach
    1.07
    या
    1.06
    -
    1.06
    ρο
    1.05
    ור
    1.05
    POSITIVE LOGITS
    i
    1.48
    ي
    1.30
    d
    1.13
    ために
    1.04
    تالي
    1.01
    0
    0.98
     جوړونک
    0.96
    いろんな
    0.96
    0.95
    يي
    0.94
    Act Density 0.000%

    No Known Activations