INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     luas
    -0.08
    .Fr
    -0.08
    .Shapes
    -0.08
     diabetic
    -0.08
     tires
    -0.08
     높은
    -0.07
     espe
    -0.07
    bea
    -0.07
    Luc
    -0.07
     далеко
    -0.07
    POSITIVE LOGITS
     означ
    0.09
     denying
    0.09
    reject
    0.08
     criticizing
    0.08
     competente
    0.08
     penal
    0.08
     Kent
    0.08
     Penal
    0.08
     equivalente
    0.08
     existential
    0.07
    Act Density 0.012%

    No Known Activations