INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     haz
    -0.08
     zing
    -0.08
    functional
    -0.08
     Lima
    -0.07
    -0.07
    	List
    -0.07
    koop
    -0.07
     asap
    -0.07
     meel
    -0.07
     stomach
    -0.07
    POSITIVE LOGITS
     adi
    0.08
     discrep
    0.08
    ersen
    0.08
    орам
    0.07
     chosen
    0.07
     betrieben
    0.07
     empfohlen
    0.07
     scholarship
    0.07
     einge
    0.07
     Engagement
    0.07
    Act Density 0.002%

    No Known Activations