INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    0.91
    0.84
    0.74
    0.71
    ון
    0.70
    0.70
     alveolar
    0.68
     данные
    0.65
     zatem
    0.64
     suro
    0.63
    POSITIVE LOGITS
    ت
    1.15
    ed
    1.02
    س
    0.95
    ts
    0.84
    सँग
    0.82
    تهم
    0.82
    k
    0.82
    с
    0.79
    et
    0.78
    da
    0.77
    Act Density 0.003%

    No Known Activations