INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     acknow
    -0.09
     acknowledges
    -0.09
     bindings
    -0.09
     Kalam
    -0.08
     acknowled
    -0.08
     acknowledge
    -0.08
     acknowledgment
    -0.08
     acknowledging
    -0.08
     סדר
    -0.08
     bac
    -0.08
    POSITIVE LOGITS
     flutter
    0.08
    Obstacle
    0.08
     Stadium
    0.08
    Fac
    0.07
    -big
    0.07
     obstacle
    0.07
    Sized
    0.07
    Preparing
    0.07
    突破
    0.07
    Avatar
    0.07
    Act Density 0.000%

    No Known Activations