INDEX
    Explanations

    phrases indicating risks and threats to health or safety

    New Auto-Interp
    Negative Logits
     Tale
    -0.17
    ero
    -0.15
    èħ
    -0.15
    mani
    -0.15
    Inline
    -0.14
    ani
    -0.14
    ona
    -0.14
    onas
    -0.14
    QRSTUV
    -0.14
     imm
    -0.14
    POSITIVE LOGITS
    unden
    0.15
    رÙĬÙģ
    0.14
    ellation
    0.14
    oord
    0.14
    ullo
    0.14
    grading
    0.14
    egasus
    0.14
    оÑĢом
    0.14
    acus
    0.13
    agem
    0.13
    Act Density 0.012%

    No Known Activations