INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     grasp
    -0.75
     matter
    -0.74
     buoy
    -0.74
     outfielder
    -0.74
     affiliate
    -0.74
     seasoned
    -0.73
     devastated
    -0.71
     vigil
    -0.71
     candles
    -0.71
     editor
    -0.70
    POSITIVE LOGITS
    true
    1.35
    classic
    1.34
    normal
    1.34
    false
    1.33
    official
    1.32
    traditional
    1.30
    cheat
    1.30
    Golden
    1.29
    Hey
    1.28
    Hello
    1.28
    Act Density 0.287%

    No Known Activations