INDEX
Explanations
instances of violence and death
New Auto-Interp
Negative Logits
ampie
-0.17
cka
-0.17
egot
-0.16
upt
-0.15
inki
-0.15
uffman
-0.15
plib
-0.15
egend
-0.15
abant
-0.15
erval
-0.14
POSITIVE LOGITS
unconscious
0.31
conv
0.28
gas
0.28
conscious
0.26
gas
0.25
conscious
0.24
consciousness
0.24
motion
0.24
struggling
0.24
woo
0.23
Activations Density 0.364%