INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     aperture
    -0.07
     частини
    -0.07
    ışık
    -0.06
    poň
    -0.06
    _CA
    -0.06
    іч
    -0.06
    UFFIX
    -0.06
     acomp
    -0.06
     ruta
    -0.06
    (java
    -0.06
    POSITIVE LOGITS
     актив
    0.07
    _CD
    0.06
     trope
    0.06
     contrasts
    0.06
    uste
    0.06
     MAL
    0.06
     vile
    0.06
     oyun
    0.06
     engagement
    0.06
    .SELECT
    0.06
    Act Density 0.013%

    No Known Activations