INDEX
    Explanations

    sentences with medical or violent content

    statements involving incidents and outcomes, particularly injuries or damages

    New Auto-Interp
    Negative Logits
    anting
    -0.85
    userc
    -0.79
    isphere
    -0.77
    itaire
    -0.75
    itating
    -0.70
    ogl
    -0.69
     forwarding
    -0.69
    ensibly
    -0.69
    anted
    -0.68
     intermediate
    -0.67
    POSITIVE LOGITS
     However
    1.06
     Additionally
    1.05
     Also
    0.97
     Photograph
    0.95
     Meanwhile
    0.94
     Nevertheless
    0.92
     Alternatively
    0.86
     Nonetheless
    0.86
     Furthermore
    0.85
     Else
    0.85
    Act Density 0.563%

    No Known Activations