INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.02
    ין
    1.00
     vutta
    0.99
    дже
    0.93
    б
    0.92
    ווי
    0.92
    طات
    0.91
    리그
    0.89
    τε
    0.87
    телни
    0.87
    POSITIVE LOGITS
    n
    1.70
    h
    1.38
    1.20
    ir
    1.16
    ly
    1.16
     can
    1.15
    et
    1.13
    س
    1.13
     you
    1.12
    ier
    1.09
    Act Density 0.008%

    No Known Activations