INDEX
    Explanations

    punctuation marks or the presence of commas in the text

    New Auto-Interp
    Negative Logits
    Ïĥμο
    -0.14
    reject
    -0.14
    ute
    -0.14
    pers
    -0.14
    ...↵
    -0.14
     Hi
    -0.14
       
    -0.14
    â̦
    -0.14
    illin
    -0.13
    ush
    -0.13
    POSITIVE LOGITS
    000
    0.18
    ĶĶ
    0.14
    gor
    0.14
    ĻĤ
    0.13
    cor
    0.13
    orners
    0.13
    ãĥ³ãĤ¿
    0.13
    Û°Û°Û°
    0.13
    ousand
    0.13
    UBLIC
    0.12
    Act Density 0.088%

    No Known Activations