INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Drag
    -0.07
     Stub
    -0.07
     drag
    -0.07
     raw
    -0.07
     stab
    -0.07
    ña
    -0.06
     Barn
    -0.06
    ray
    -0.06
     Ung
    -0.06
     nl
    -0.06
    POSITIVE LOGITS
     successor
    0.09
     successors
    0.08
    esser
    0.07
     succeeds
    0.07
    .purchase
    0.07
    .se
    0.07
    ουσ
    0.07
    Prices
    0.07
    zeug
    0.07
     تصمیم
    0.07
    Act Density 0.009%

    No Known Activations