INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     chess
    -0.07
     Dup
    -0.06
     Tr
    -0.06
     invaders
    -0.06
    upe
    -0.06
     Kay
    -0.06
     pud
    -0.06
     velvet
    -0.06
    ژن
    -0.06
     abortions
    -0.06
    POSITIVE LOGITS
    career
    0.08
     pošk
    0.07
    .what
    0.07
    .goods
    0.07
    _four
    0.06
    trajectory
    0.06
    renched
    0.06
     }}>↵
    0.06
    (lang
    0.06
    üssen
    0.06
    Act Density 0.019%

    No Known Activations