INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     daunting
    -0.09
     misleading
    -0.09
    aal
    -0.08
     astronom
    -0.08
     offending
    -0.08
     nba
    -0.08
     Token
    -0.08
     colère
    -0.07
     ביותר
    -0.07
     boos
    -0.07
    POSITIVE LOGITS
     hyg
    0.11
     syrup
    0.09
    0.09
     gly
    0.09
     glycer
    0.08
     cushioning
    0.08
     visc
    0.08
     glycol
    0.08
    Humidity
    0.08
    humidity
    0.08
    Act Density 0.008%

    No Known Activations