INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ляем
    -0.08
    Elementary
    -0.08
     kenne
    -0.08
    iant
    -0.07
     systematically
    -0.07
     elementary
    -0.07
    orough
    -0.07
    -0.07
     conos
    -0.07
     Elementary
    -0.07
    POSITIVE LOGITS
     تزيد
    0.08
     sürd
    0.08
     fills
    0.08
     caminhos
    0.08
     renovar
    0.08
    	frame
    0.08
    َى
    0.08
    dsn
    0.08
    omie
    0.07
    -wa
    0.07
    Act Density 0.001%

    No Known Activations