INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     the
    0.44
    $
    0.44
     an
    0.43
    ال
    0.41
    \
    0.40
     it
    0.39
    0.39
     \
    0.36
     as
    0.32
    ization
    0.32
    POSITIVE LOGITS
    at
    0.48
    AT
    0.44
    OM
    0.43
    TS
    0.43
    BS
    0.42
    νο
    0.39
    0.37
    AB
    0.37
    UR
    0.36
    OL
    0.36
    Act Density 0.365%

    No Known Activations