INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     prof
    -0.07
     Str
    -0.07
     churches
    -0.07
     Yorkers
    -0.07
    curl
    -0.07
     Heading
    -0.07
    _pp
    -0.06
     Sans
    -0.06
     Paul
    -0.06
     Ell
    -0.06
    POSITIVE LOGITS
     discrete
    0.08
     discret
    0.08
    vertime
    0.07
     Prescott
    0.07
     mute
    0.06
    ete
    0.06
    affle
    0.06
     glare
    0.06
    zeň
    0.06
     discreet
    0.06
    Act Density 0.003%

    No Known Activations