INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     spilled
    -0.88
     healed
    -0.73
    onite
    -0.72
     soaked
    -0.70
    rek
    -0.70
    urger
    -0.69
     offending
    -0.69
     discharged
    -0.68
     exc
    -0.68
    alled
    -0.68
    POSITIVE LOGITS
    ¶
    1.18
     Why
    1.09
     Answer
    1.04
     Where
    1.03
     Nope
    1.02
     Consider
    1.01
     Simply
    0.99
    [/
    0.98
     Does
    0.98
     What
    0.97
    Act Density 0.090%

    No Known Activations