INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ك
    1.15
    ە
    1.13
    ן
    1.13
    ς
    1.10
    лі
    1.07
    ся
    1.03
    ธ์
    1.03
    റ്റർ
    1.02
    માં
    1.02
    ے
    1.01
    POSITIVE LOGITS
    n
    1.60
    A
    1.42
    W
    1.37
     be
    1.31
    ت
    1.28
    X
    1.27
     was
    1.24
    at
    1.23
    i
    1.23
     \
    1.22
    Act Density 0.028%

    No Known Activations