INDEX
    Explanations

    the presence of specific formatting or structural elements in text, particularly in mathematical or quantitative contexts

    New Auto-Interp
    Negative Logits
    ?
    -0.69
    .
    -0.66
    )
    -0.58
    </h2>
    -0.52
    ↵↵
    -0.52
    ,
    -0.52
    :
    -0.52
    ii
    -0.49
    [toxicity=0]
    -0.48
    </b>
    -0.47
    POSITIVE LOGITS
     Савезне
    1.27
     nakalista
    1.03
     &___
    1.01
    ьаж
    0.97
    Personendaten
    0.96
    ########.
    0.95
    "]="
    0.95
     kaarangay
    0.94
     autorytatywna
    0.93
    enterOuterAlt
    0.92
    Act Density 0.000%

    No Known Activations