INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Она
    -2.48
    '
    -2.44
     двумя
    -2.31
     飯店
    -2.31
     However
    -2.30
     навіть
    -2.22
    -2.20
    つまり
    -2.19
    それにしても
    -2.19
    colgante
    -2.16
    POSITIVE LOGITS
     și
    2.14
    ATTLE
    2.13
    un
    2.09
    2.08
     ouverture
    2.06
    2.06
    2.05
    us
    2.03
    2.00
    da
    1.99
    Act Density 0.005%

    No Known Activations