INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     threaten
    -0.07
     Cutter
    -0.06
     Goldman
    -0.06
    canonical
    -0.06
     soaring
    -0.06
    soft
    -0.06
    τες
    -0.06
     femin
    -0.06
    anking
    -0.06
    eter
    -0.06
    POSITIVE LOGITS
    ibli
    0.07
     Fare
    0.07
    _close
    0.07
    /entities
    0.06
     zaměst
    0.06
     semester
    0.06
     yell
    0.06
    ierge
    0.06
    .Modified
    0.06
     Nga
    0.06
    Act Density 0.003%

    No Known Activations