INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     on
    1.30
    ra
    1.25
    t
    1.15
    ות
    1.14
     as
    1.09
    re
    1.06
    al
    1.05
    ti
    0.98
    to
    0.95
    n
    0.95
    POSITIVE LOGITS
     pioneers
    1.07
    ā
    0.89
    0.84
    ب
    0.84
    {
    0.82
     pioneering
    0.78
    +:
    0.77
    0
    0.77
     pioneered
    0.75
     Pioneers
    0.75
    Act Density 0.002%

    No Known Activations