INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bait
    -0.07
    alloween
    -0.07
     Jan
    -0.07
    Ser
    -0.07
     Weber
    -0.07
    utenberg
    -0.06
    itor
    -0.06
     managers
    -0.06
     Gür
    -0.06
    makt
    -0.06
    POSITIVE LOGITS
    rh
    0.08
     Rhodes
    0.08
    imple
    0.07
     snaží
    0.07
     ره
    0.07
    139
    0.07
    PLE
    0.07
     RH
    0.07
     Rh
    0.07
     Rhe
    0.06
    Act Density 0.009%

    No Known Activations