INDEX
    Explanations

    phrases or sentences warning about the content of the text

    phrases indicating the presence of content or materials within various contexts

    New Auto-Interp
    Negative Logits
    doms
    -0.79
     Sabha
    -0.75
    apo
    -0.69
    urai
    -0.67
     Seym
    -0.65
    icably
    -0.65
    laus
    -0.64
    sett
    -0.63
    liner
    -0.63
    zai
    -0.62
    POSITIVE LOGITS
    ttes
    0.77
     contents
    0.76
    ãĤ¼ãĤ¦ãĤ¹
    0.73
    encies
    0.72
     Contains
    0.72
    ãĤ©
    0.69
    ãĤ£
    0.69
    iveness
    0.68
    Material
    0.68
    ãĤ·ãĥ£
    0.67
    Act Density 0.024%

    No Known Activations