INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    are
    1.42
     are
    1.36
    ها
    1.26
    ется
    1.11
    $,
    1.10
    ;
    1.10
    ן
    0.97
    0.96
    +](=
    0.96
    هاي
    0.95
    POSITIVE LOGITS
    '
    1.41
     for
    1.35
     Clock
    1.25
    S
    1.22
    B
    1.20
     clock
    1.18
    1.16
    E
    1.13
    U
    1.10
    L
    1.09
    Act Density 0.005%

    No Known Activations