INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    principalColumn
    -0.82
    +#+#
    -0.71
    AndEndTag
    -0.66
     betweenstory
    -0.65
    Instead
    -0.65
     estekak
    -0.65
    يكب
    -0.64
     itſelf
    -0.64
     ComVisible
    -0.64
    endpush
    -0.63
    POSITIVE LOGITS
    <bos>
    0.58
    ",&
    0.54
    фика
    0.51
     but
    0.50
    '
    0.47
     must
    0.47
     are
    0.45
    ↵↵
    0.45
     non
    0.44
     pur
    0.44
    Act Density 0.003%

    No Known Activations