INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    on
    1.41
     on
    1.30
    F
    1.09
    L
    1.08
    ב
    1.08
    ?
    1.07
    1.05
    of
    1.03
    J
    1.03
    ت
    0.99
    POSITIVE LOGITS
     berühm
    0.84
    ۴
    0.82
    cape
    0.76
    f
    0.75
    ta
    0.75
     we
    0.73
    ്യ
    0.73
     weaves
    0.73
    4
    0.73
    ه
    0.73
    Act Density 0.008%

    No Known Activations