INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0
    1.46
    š
    1.32
    1.23
    .
    0.99
     as
    0.98
    يا
    0.95
    ี่
    0.93
    ร้าย
    0.93
     in
    0.92
    л
    0.90
    POSITIVE LOGITS
    b
    1.78
    m
    1.55
    n
    1.53
    k
    1.45
    z
    1.35
    p
    1.29
    x
    1.29
    er
    1.28
    f
    1.27
    j
    1.24
    Act Density 0.002%

    No Known Activations