INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     discuss
    -0.08
     mag
    -0.07
     EMS
    -0.07
     разобраться
    -0.07
     piac
    -0.07
     discusses
    -0.07
     scale
    -0.07
    dan
    -0.07
     mountain
    -0.07
     обс
    -0.07
    POSITIVE LOGITS
    0.09
     heterosexual
    0.09
    最多
    0.08
    uelle
    0.08
     manicure
    0.08
    яў
    0.08
     kaarten
    0.08
     הזמן
    0.08
    িযোগ
    0.08
    0.08
    Act Density 0.015%

    No Known Activations