INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    anga
    -0.15
    @nate
    -0.15
    asco
    -0.15
     Freund
    -0.13
    onde
    -0.13
    herits
    -0.13
    oor
    -0.13
    acket
    -0.13
    esser
    -0.13
    μβ
    -0.13
    POSITIVE LOGITS
    zew
    0.15
    eurs
    0.14
    figcaption
    0.14
    summary
    0.13
    é϶
    0.13
     </
    0.13
     pork
    0.13
    maz
    0.13
     dez
    0.13
    dd
    0.13
    Act Density 0.024%

    No Known Activations