INDEX
    Explanations

    occurrences of non-word characters and formatting symbols

    New Auto-Interp
    Negative Logits
    ritch
    -0.15
    iв
    -0.14
    hani
    -0.14
    ÑĢиÑĩ
    -0.13
    ugas
    -0.13
    ÑĢд
    -0.13
    orks
    -0.13
    atform
    -0.13
    ?↵↵↵↵↵↵
    -0.13
    ÑĢÑıд
    -0.13
    POSITIVE LOGITS
    {:
    0.16
    #__
    0.15
    ãĢģãĢĬ
    0.15
    _-_
    0.14
     å¯
    0.14
    ument
    0.14
     #:
    0.14
    lagen
    0.14
    ëIJĺìĹĪìĬµëĭĪëĭ¤
    0.14
    318
    0.14
    Act Density 0.048%

    No Known Activations