INDEX
Explanations
phrases related to acts of violence
references to individuals involved in an incident or event
New Auto-Interp
Negative Logits
Optim
-0.74
largeDownload
-0.70
âĺħâĺħ
-0.69
âĺħ
-0.69
htaking
-0.67
uits
-0.67
eworthy
-0.65
Worlds
-0.64
earch
-0.63
Wilderness
-0.63
POSITIVE LOGITS
'd
1.00
pleaded
0.87
drove
0.87
complained
0.87
'll
0.87
boarded
0.86
swore
0.83
underwent
0.82
awoke
0.82
testified
0.82
Activations Density 0.252%