INDEX
    Explanations

    phrases indicating actions related to reading content

    Tokens preceding ellipses or continuation of text

    more content indicators

    New Auto-Interp
    Negative Logits
    __*/
    -1.17
    __':
    
    -0.85
    __(/*!
    -0.73
     समीक्षाएं
    -0.71
    الحياه
    -0.69
     indisponible
    -0.68
    __':
    -0.68
    بوابة
    -0.68
    ThemeOverlay
    -0.67
    رشف
    -0.66
    POSITIVE LOGITS
    <eos>
    1.31
    ↵↵
    0.51
    <unused60>
    0.50
     urethra
    0.46
    <unused63>
    0.45
    "]}
    0.43
    Попис
    0.41
     brz
    0.41
     chitar
    0.41
    <unused61>
    0.40
    Act Density 0.179%

    No Known Activations