INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    RAY
    -0.07
    north
    -0.07
    -ch
    -0.07
     рождения
    -0.07
     surreal
    -0.06
    international
    -0.06
    ray
    -0.06
    formace
    -0.06
    ravel
    -0.06
    anon
    -0.06
    POSITIVE LOGITS
     Hack
    0.07
    Wunused
    0.07
     Ngh
    0.07
     etkin
    0.07
    .Restrict
    0.06
     çık
    0.06
    &gt
    0.06
     processors
    0.06
     accuracy
    0.06
    824
    0.06
    Act Density 0.009%

    No Known Activations