INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    خم
    -0.07
    Todo
    -0.07
     rocker
    -0.07
     tit
    -0.07
     nov
    -0.06
     phantom
    -0.06
     hypo
    -0.06
    .tiles
    -0.06
    ทธ
    -0.06
     teased
    -0.06
    POSITIVE LOGITS
    POOL
    0.07
    (va
    0.07
     исключ
    0.06
    524
    0.06
    _WRONG
    0.06
    schema
    0.06
     redund
    0.06
     fled
    0.06
    0.06
    604
    0.06
    Act Density 0.021%

    No Known Activations