INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Emmanuel
    -0.08
    vimbo
    -0.08
    -0.07
     disen
    -0.07
     Boris
    -0.07
     throughout
    -0.07
     rumours
    -0.07
    .task
    -0.07
    usan
    -0.07
     travels
    -0.07
    POSITIVE LOGITS
    _than
    0.09
     niż
    0.09
    Than
    0.08
     berarti
    0.08
     rethink
    0.08
    -than
    0.08
    意义
    0.08
     Appreciation
    0.08
     rewrite
    0.08
    _THAN
    0.08
    Act Density 0.060%

    No Known Activations