INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _Dis
    -0.08
    Dao
    -0.07
     ATK
    -0.07
    _.
    -0.06
    Б
    -0.06
    νη
    -0.06
    _style
    -0.06
     Obviously
    -0.06
     kültür
    -0.06
     Answer
    -0.06
    POSITIVE LOGITS
     yatır
    0.07
    idend
    0.06
    0.06
    ้าต
    0.06
     spawned
    0.06
     removes
    0.06
    ejte
    0.06
     들어
    0.06
    лач
    0.06
     automate
    0.06
    Act Density 0.000%

    No Known Activations