INDEX
    Explanations

    mention of terrorist activities or groups

    New Auto-Interp
    Negative Logits
    endum
    -0.77
    galitarian
    -0.74
    dit
    -0.68
    bye
    -0.66
     Quartz
    -0.66
    resso
    -0.65
    flush
    -0.63
     Grape
    -0.63
    laus
    -0.62
    Salt
    -0.62
    POSITIVE LOGITS
    fully
    0.98
    abad
    0.95
     attacks
    0.88
     spree
    0.87
    efully
    0.86
    istan
    0.86
     raids
    0.83
    fulness
    0.82
    istani
    0.82
     bombing
    0.81
    Act Density 0.011%

    No Known Activations