INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    FINE
    -0.77
    £ı
    -0.70
    uliffe
    -0.69
     hypert
    -0.68
    SIGN
    -0.68
     shorth
    -0.67
     prescribing
    -0.65
     boycot
    -0.64
     ZIP
    -0.64
     paved
    -0.64
    POSITIVE LOGITS
    ources
    1.21
    ouls
    1.07
    ourced
    1.04
    omew
    1.03
    linger
    1.02
    ault
    1.01
    ourcing
    1.01
    inks
    0.97
    addle
    0.96
    essions
    0.95
    Act Density 0.114%

    No Known Activations