INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     FBI
    -0.07
    'in
    -0.07
     proj
    -0.07
    ’in
    -0.06
    -na
    -0.06
    ospital
    -0.06
    summ
    -0.06
    .docs
    -0.06
    .Co
    -0.06
    رف
    -0.06
    POSITIVE LOGITS
     hairstyle
    0.08
    _success
    0.07
    /news
    0.07
     искус
    0.07
    /values
    0.07
    -left
    0.07
    евых
    0.07
    oneksi
    0.07
     labeled
    0.07
    0.06
    Act Density 0.002%

    No Known Activations