INDEX
    Explanations

    the presence of various numerical representations and symbols in the text

    New Auto-Interp
    Negative Logits
    baugh
    -0.17
    okable
    -0.16
    ographer
    -0.15
    xee
    -0.15
    NST
    -0.14
    ÑģÑĤÑĢÑĥ
    -0.14
    icontrol
    -0.14
    ÑĢажд
    -0.14
    _UNUSED
    -0.13
    xm
    -0.13
    POSITIVE LOGITS
    181
    0.28
    178
    0.27
    179
    0.27
    186
    0.25
    184
    0.25
    183
    0.24
    177
    0.23
    191
    0.23
    190
    0.23
    182
    0.23
    Act Density 0.085%

    No Known Activations