INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     WG
    -0.07
     AIS
    -0.06
     tolerate
    -0.06
     mouth
    -0.06
    gradable
    -0.06
     WC
    -0.06
    (line
    -0.06
     Ros
    -0.06
     Tournament
    -0.06
     hogy
    -0.06
    POSITIVE LOGITS
     Можно
    0.07
     Türkçe
    0.06
    .jetbrains
    0.06
    .pub
    0.06
    0.06
     Boise
    0.06
    ieber
    0.06
    ietet
    0.06
    Penn
    0.06
     가능한
    0.06
    Act Density 0.004%

    No Known Activations