INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    \
    1.02
    '
    0.92
    0.91
     an
    0.83
    <0x91>
    0.82
     in
    0.79
    0.76
    𝟮
    0.76
    0.75
    ני
    0.74
    POSITIVE LOGITS
    b
    1.28
    g
    1.23
     process
    0.98
    r
    0.98
    u
    0.98
    ع
    0.95
     процесс
    0.94
    á
    0.89
     Process
    0.88
    AB
    0.87
    Act Density 0.043%

    No Known Activations