INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ْت
    -0.07
     mek
    -0.07
     militant
    -0.06
     writ
    -0.06
     RT
    -0.06
     bot
    -0.06
    -set
    -0.06
     Akt
    -0.06
     owed
    -0.06
     goal
    -0.06
    POSITIVE LOGITS
     disappears
    0.09
     disappeared
    0.08
     disappear
    0.08
     disappearing
    0.08
    Disappear
    0.08
     disappearance
    0.08
     vanished
    0.07
     vanish
    0.07
    pez
    0.07
    ":[-
    0.07
    Act Density 0.006%

    No Known Activations