INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ح
    1.32
     paralyzed
    1.21
     armoured
    1.19
     centralised
    1.18
     neighboring
    1.13
     centralized
    1.11
     neighbouring
    1.09
    u
    1.09
    ется
    1.08
     unclear
    1.06
    POSITIVE LOGITS
     прово
    1.14
     преди
    1.03
     соци
    1.02
     пола
    1.02
     трево
    1.00
     bonne
    0.97
     ги
    0.97
     прежде
    0.97
    alanine
    0.97
     Nagano
    0.97
    Act Density 0.002%

    No Known Activations