INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    707
    -0.07
     gall
    -0.07
     rsp
    -0.07
    *p
    -0.07
     contempl
    -0.07
     softball
    -0.07
     under
    -0.06
     rapp
    -0.06
     bots
    -0.06
     ded
    -0.06
    POSITIVE LOGITS
     increase
    0.14
     increased
    0.13
     Increase
    0.12
     increasing
    0.12
     increases
    0.11
     Increased
    0.11
    Increased
    0.11
    increase
    0.10
    incre
    0.10
    Increase
    0.10
    Act Density 0.068%

    No Known Activations