INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     touching
    -0.10
     pitfalls
    -0.09
    צריך
    -0.08
     academ
    -0.08
     correlated
    -0.08
     reta
    -0.07
     προ
    -0.07
     emphasizing
    -0.07
     resistente
    -0.07
     invloed
    -0.07
    POSITIVE LOGITS
    =True
    0.08
    grat
    0.08
     conscience
    0.08
     anger
    0.08
     feeling
    0.08
    0.08
    _msg
    0.08
    /conf
    0.07
    0.07
     havi
    0.07
    Act Density 0.002%

    No Known Activations