INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ¶Į
    -0.10
    ernaut
    -0.09
    odon
    -0.09
    elic
    -0.08
    اÙĦØ¥ÙĨجÙĦÙĬزÙĬØ©
    -0.08
     còn
    -0.08
    airo
    -0.08
    åıĬåħ¶
    -0.08
    ''"
    -0.08
    telefone
    -0.08
    POSITIVE LOGITS
     below
    0.32
     Below
    0.30
    Below
    0.28
    以ä¸ĭ
    0.28
    below
    0.25
     following
    0.22
     ниже
    0.20
    ä¸ĭ
    0.20
     Here
    0.19
     BELOW
    0.18
    Act Density 0.085%

    No Known Activations