INDEX
Explanations
phrases related to actions of violence, particularly killing
the word "kill" in various contexts
New Auto-Interp
Negative Logits
arity
-0.73
anwhile
-0.72
concess
-0.72
Collider
-0.69
umn
-0.67
rial
-0.66
Jer
-0.66
ourn
-0.65
Hilbert
-0.65
MpServer
-0.65
POSITIVE LOGITS
kill
0.97
spree
0.89
kills
0.84
hog
0.84
killers
0.84
killing
0.83
icides
0.83
killer
0.83
icidal
0.81
houses
0.78
Activations Density 0.020%