INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Pearl
    -0.07
    olithic
    -0.07
     eder
    -0.07
     dece
    -0.06
    POSITE
    -0.06
    efa
    -0.06
     betrayal
    -0.06
     века
    -0.06
    voie
    -0.06
    -board
    -0.06
    POSITIVE LOGITS
     run
    0.15
     Run
    0.14
     runs
    0.12
    Run
    0.12
     running
    0.12
     Running
    0.11
     RUN
    0.11
    run
    0.11
     ran
    0.10
     Ran
    0.10
    Act Density 0.057%

    No Known Activations