INDEX
    Explanations

    for with purpose or reason

    New Auto-Interp
    Negative Logits
    in
    1.82
    I
    1.38
    O
    1.38
    em
    1.03
    D
    1.02
    S
    1.02
    B
    0.98
    K
    0.96
    0.95
    E
    0.90
    POSITIVE LOGITS
    ע
    1.30
    ку
    1.09
    lt
    0.99
    rt
    0.88
    0.86
    cc
    0.84
    ни
    0.84
    ds
    0.84
    ku
    0.83
     be
    0.82
    Act Density 0.641%

    No Known Activations