INDEX
    Explanations

    phrases related to violent attacks and responsibility claims

    New Auto-Interp
    Negative Logits
     ?...
    -1.83
     emphat
    -1.80
     desir
    -1.79
     !...
    -1.79
     effe
    -1.75
     accla
    -1.72
     increa
    -1.72
     affor
    -1.71
     unden
    -1.70
     suscep
    -1.69
    POSITIVE LOGITS
    .
    0.83
     while
    0.76
     after
    0.75
    ;
    0.74
     but
    0.74
    0.73
     although
    0.73
     when
    0.73
     for
    0.71
     because
    0.68
    Act Density 0.483%

    No Known Activations