INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ii
    -0.11
    an
    -0.10
     Mikhail
    -0.09
    iat
    -0.09
    ation
    -0.09
     Unsafe
    -0.09
    eron
    -0.09
    oir
    -0.09
    cov
    -0.09
    alan
    -0.09
    POSITIVE LOGITS
     dioxide
    0.11
    erra
    0.11
    ivist
    0.11
    ssue
    0.11
    awan
    0.10
    empo
    0.10
    plate
    0.10
    ervers
    0.10
    angle
    0.10
    pton
    0.10
    Act Density 0.052%

    No Known Activations