INDEX
Explanations
references to violence and casualties
New Auto-Interp
Negative Logits
Killing
-0.15
quit
-0.15
Dead
-0.15
dead
-0.14
apon
-0.14
rani
-0.14
andalone
-0.14
acco
-0.14
evid
-0.14
asp
-0.14
POSITIVE LOGITS
during
0.21
trying
0.18
by
0.17
when
0.16
while
0.16
during
0.16
eva
0.15
falling
0.15
During
0.15
attempting
0.15
Activations Density 0.066%