INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Tark
    -0.08
    Invest
    -0.08
    Matching
    -0.08
     Luk
    -0.08
    154
    -0.08
     Kafka
    -0.07
    Broken
    -0.07
     sko
    -0.07
     VEG
    -0.07
     Clamp
    -0.07
    POSITIVE LOGITS
    0.08
     εφαρ
    0.08
    0.08
    _REFER
    0.07
     הא
    0.07
    (ep
    0.07
    (r
    0.07
     EJ
    0.07
     Ej
    0.07
     ej
    0.07
    Act Density 0.001%

    No Known Activations