INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    o
    1.39
    ای
    1.25
    1.18
    as
    1.16
    ک
    1.15
    ق
    1.12
    ために
    1.09
    1.07
    q
    1.04
    1.02
    POSITIVE LOGITS
     Elephant
    1.23
    -
    1.21
     elephant
    1.16
     
    1.11
    🐘
    0.99
    0.98
     elef
    0.96
    Elephant
    0.95
    .
    0.92
     Bhutan
    0.91
    Act Density 0.010%

    No Known Activations