INDEX
Explanations
phrases related to conflict or violence
phrases indicating the emergence or beginning of conflicts or disturbances
New Auto-Interp
Negative Logits
antry
-0.79
confir
-0.73
opa
-0.66
downgrade
-0.62
miss
-0.60
misses
-0.60
oppy
-0.59
vetoed
-0.59
retracted
-0.58
osuke
-0.57
POSITIVE LOGITS
stretched
0.82
quished
0.75
olate
0.72
flows
0.71
casts
0.71
rer
0.70
flow
0.68
Sax
0.67
rers
0.66
valves
0.64
Activations Density 0.025%