INDEX
Explanations
phrases related to violent incidents or accidents involving individuals
actions or events involving explosions or acts of violence
New Auto-Interp
Negative Logits
equivalents
-0.72
âĨij
-0.68
blah
-0.66
anymore
-0.62
âĶĢâĶĢâĶĢâĶĢ
-0.62
sorted
-0.62
=-=-=-=-
-0.61
squared
-0.61
curated
-0.60
equal
-0.59
POSITIVE LOGITS
himself
1.00
explosives
0.87
fatally
0.81
edly
0.78
andals
0.77
explosive
0.76
angering
0.76
suicide
0.75
rampage
0.72
Himself
0.71
Activations Density 0.275%