INDEX
Explanations
references to death and the death penalty
references to the death penalty
New Auto-Interp
Negative Logits
CN
-0.78
umar
-0.75
Cola
-0.75
MN
-0.73
ECA
-0.72
Americ
-0.72
EEK
-0.71
ĸļ
-0.69
Avg
-0.67
æ©Ł
-0.65
POSITIVE LOGITS
blow
1.04
toll
0.97
guard
0.95
adder
0.94
stroke
0.94
locked
0.93
match
0.93
trap
0.91
thro
0.88
bed
0.88
Activations Density 0.049%