INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     manifold
    -0.07
     worlds
    -0.07
     bundled
    -0.07
    uptime
    -0.07
     Alman
    -0.07
     tapped
    -0.07
     dumb
    -0.06
    AMY
    -0.06
     vanished
    -0.06
     Wall
    -0.06
    POSITIVE LOGITS
     correct
    0.11
     incorrect
    0.09
    correct
    0.09
    르게
    0.08
    0.08
    roc
    0.07
    Incorrect
    0.07
    enko
    0.07
    Correct
    0.07
    igua
    0.07
    Act Density 0.027%

    No Known Activations