INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    anor
    -0.08
    ану
    -0.08
     crumb
    -0.08
    реб
    -0.08
    همية
    -0.08
     dedo
    -0.08
     paw
    -0.08
    сць
    -0.08
     commandments
    -0.07
    corn
    -0.07
    POSITIVE LOGITS
    лиж
    0.16
    ли
    0.09
    ав
    0.09
    Uit
    0.08
    .front
    0.08
    0.07
    -impact
    0.07
    ibl
    0.07
    imate
    0.07
    imo
    0.07
    Act Density 0.000%

    No Known Activations