INDEX
Explanations
mentions of or references to death
references to death
New Auto-Interp
Negative Logits
Cola
-0.87
umar
-0.80
Avg
-0.75
ECA
-0.74
kef
-0.74
OPER
-0.73
CN
-0.71
ĸļ
-0.70
æ©Ł
-0.70
EEK
-0.70
POSITIVE LOGITS
toll
0.96
bed
0.89
blow
0.89
stroke
0.87
guard
0.80
adder
0.80
match
0.77
psychiat
0.76
Toll
0.76
penalty
0.76
Activations Density 0.031%