INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
     abrasion
    -0.07
    ding
    -0.07
    uten
    -0.07
     lak
    -0.07
     desp
    -0.07
     abras
    -0.07
     қай
    -0.07
    .ca
    -0.07
     מנ
    -0.07
    POSITIVE LOGITS
     køb
    0.08
     Ausnahme
    0.08
     intrig
    0.08
     equivoc
    0.08
    _ERRORS
    0.07
     anomaly
    0.07
     triv
    0.07
    0.07
    0.07
     BUG
    0.07
    Act Density 0.003%

    No Known Activations