INDEX
    Explanations

    the presence of specific contextual markers or format indicators in the text

    URLs and common content markers

    New Auto-Interp
    Negative Logits
    '}>
    -0.73
    .*")]
    -0.70
     Normdatei
    -0.68
    "]];
    -0.68
     Италијани
    -0.68
    protoimpl
    -0.67
    ]]
    
    -0.65
     tartalomajánló
    -0.65
    ]]:
    -0.65
    Diweddarwch
    -0.64
    POSITIVE LOGITS
    /
    1.27
    /?
    0.80
    /…
    0.79
    /"
    0.76
    /.
    0.75
    /%
    0.73
    /...
    0.73
    /'
    0.72
    /,
    0.70
    /}{
    0.68
    Act Density 0.205%

    No Known Activations