INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     majority
    -0.07
    530
    -0.07
     along
    -0.07
     smack
    -0.07
     factoring
    -0.07
     emancip
    -0.07
    -0.07
     cutting
    -0.07
     Maximum
    -0.06
    .ev
    -0.06
    POSITIVE LOGITS
     намер
    0.09
     نقط
    0.09
     поговор
    0.09
    _outline
    0.08
     खु
    0.08
    apit
    0.08
     заключ
    0.08
     chapitre
    0.08
    וז
    0.08
     ముగ
    0.08
    Act Density 0.005%

    No Known Activations