INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     _(
    -0.09
    Е
    -0.08
     appreciative
    -0.08
     могуць
    -0.08
     hailed
    -0.08
    -0.08
     laden
    -0.08
     curate
    -0.08
     inexp
    -0.07
     referenced
    -0.07
    POSITIVE LOGITS
     JW
    0.08
     Wolf
    0.08
     Bah
    0.08
     bah
    0.08
     Shay
    0.08
     STF
    0.07
     Torres
    0.07
     Ped
    0.07
    Islam
    0.07
     Rasmussen
    0.07
    Act Density 0.021%

    No Known Activations