INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    elay
    -0.16
    aky
    -0.16
    remen
    -0.15
    peare
    -0.15
    _callable
    -0.15
    ideo
    -0.15
    apon
    -0.14
    rious
    -0.14
    ideos
    -0.14
    uto
    -0.14
    POSITIVE LOGITS
    essler
    0.25
    ehler
    0.25
    hr
    0.24
    hl
    0.22
    eger
    0.21
    uffer
    0.21
    hn
    0.20
    jc
    0.20
    eden
    0.19
    chter
    0.19
    Act Density 0.100%

    No Known Activations