INDEX
    Explanations

    the presence of various document formatting and structural elements

    New Auto-Interp
    Negative Logits
    Портали
    -0.93
    <?
    -0.88
     ivelany
    -0.87
    #+#
    -0.81
    ſelves
    -0.80
     bezeichneter
    -0.79
    ſelf
    -0.79
     كومونز
    -0.79
    principalColumn
    -0.78
    BeginContext
    -0.78
    POSITIVE LOGITS
    ...
    0.51
    0.51
    --
    0.50
    ,"
    0.50
    <eos>
    0.48
    O
    0.47
    ↵↵
    0.46
    (
    0.46
    You
    0.45
    mk
    0.44
    Act Density 0.010%

    No Known Activations