INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Fritz
    -0.07
     Sunni
    -0.07
    scar
    -0.07
     niž
    -0.06
    Gap
    -0.06
     thấp
    -0.06
     outro
    -0.06
    طور
    -0.06
     hodin
    -0.06
     bean
    -0.06
    POSITIVE LOGITS
     glad
    0.07
     ext
    0.07
     Глав
    0.06
    经理
    0.06
     Например
    0.06
     жиз
    0.06
     artikel
    0.06
     attn
    0.06
     Pamela
    0.06
     embodied
    0.06
    Act Density 0.014%

    No Known Activations