INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ב
    1.66
    ט
    1.61
    0
    1.57
    ات
    1.50
    д
    1.48
    (
    1.34
    טן
    1.31
    ک
    1.23
    1.23
    ों
    1.22
    POSITIVE LOGITS
    n
    1.20
    ure
    1.13
    p
    1.05
    m
    1.02
    man
    1.02
     و
    0.99
    é
    0.99
    aj
    0.97
    nte
    0.96
    0.95
    Act Density 0.000%

    No Known Activations