INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    2
    0.82
     नाम
    0.81
    alsa
    0.80
     ਸ਼
    0.78
     م
    0.77
    agens
    0.77
    aston
    0.77
    жливо
    0.77
    0.76
    aks
    0.75
    POSITIVE LOGITS
    il
    1.19
    in
    1.03
    ي
    0.96
    i
    0.95
    er
    0.90
    T
    0.88
    ir
    0.84
    x
    0.83
    it
    0.78
    u
    0.75
    Act Density 0.007%

    No Known Activations