INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    regated
    -0.85
     שם
    -0.79
     sexto
    -0.78
    lewood
    -0.74
    reur
    -0.74
     pleasures
    -0.72
    lungen
    -0.72
    Zitat
    -0.71
     gosta
    -0.71
    -0.70
    POSITIVE LOGITS
    fore
    1.64
    by
    1.60
     самым
    1.38
    1.31
    efore
    1.31
     fore
    1.23
    BY
    1.23
    forth
    1.18
    而在
    1.17
    eby
    1.11
    Act Density 0.039%

    No Known Activations