INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    er
    -0.17
    anton
    -0.17
    fps
    -0.16
    auer
    -0.16
    žen
    -0.16
    ajs
    -0.15
    ivi
    -0.15
    fte
    -0.15
     Fot
    -0.15
    éĿł
    -0.14
    POSITIVE LOGITS
    resher
    0.31
    erral
    0.31
    uge
    0.31
    errals
    0.30
    usal
    0.30
    eree
    0.29
    erring
    0.28
    lector
    0.28
    inery
    0.27
    inement
    0.27
    Act Density 0.012%

    No Known Activations