INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
    лев
    -0.08
    Trabajo
    -0.08
     pekerjaan
    -0.08
    Му
    -0.07
    keun
    -0.07
     peker
    -0.07
     cheval
    -0.07
    ��
    -0.07
     asper
    -0.07
    POSITIVE LOGITS
     identity
    0.08
    (raw
    0.08
    urable
    0.08
    0.07
    orous
    0.07
     idol
    0.07
     суд
    0.07
     aloha
    0.07
    will
    0.07
     Ark
    0.07
    Act Density 0.017%

    No Known Activations