INDEX
    Explanations

    references to violence, conflict, and geopolitical events

    New Auto-Interp
    Negative Logits
    ĺħ
    -0.93
    xtap
    -0.81
    imaru
    -0.68
    ovie
    -0.67
    pton
    -0.63
    pper
    -0.63
    ube
    -0.63
    Gra
    -0.63
    ļé
    -0.62
    ppe
    -0.61
    POSITIVE LOGITS
     than
    1.31
     stringent
    0.94
    than
    0.89
     importantly
    0.89
     Than
    0.84
     sophisticated
    0.83
     frequent
    0.81
     rigorous
    0.76
     ado
    0.76
     broadly
    0.75
    Act Density 0.095%

    No Known Activations