INDEX
Explanations
words related to negative events such as attacks, crashes, and blasts
references to violent incidents or attacks
New Auto-Interp
Negative Logits
ophy
-0.71
anium
-0.68
uchin
-0.67
Intern
-0.67
iodine
-0.64
hemy
-0.64
glomer
-0.63
anol
-0.63
ILCS
-0.62
bil
-0.62
POSITIVE LOGITS
spree
1.07
occurred
0.86
unfold
0.81
âĶĢâĶĢ
0.81
rampage
0.80
happened
0.79
unfolded
0.78
stemmed
0.78
ordeal
0.77
perpetrated
0.75
Activations Density 0.200%