INDEX
Explanations
references to death or dying
occurrences of the word "die" and its context
New Auto-Interp
Negative Logits
Altern
-0.71
orney
-0.69
artney
-0.66
Mand
-0.65
rouse
-0.65
OPER
-0.64
OR
-0.63
Mand
-0.63
Kag
-0.62
Grab
-0.62
POSITIVE LOGITS
die
1.09
getic
0.90
dies
0.84
bold
0.82
fighter
0.81
ffen
0.78
die
0.76
fighters
0.74
dying
0.74
ously
0.73
Activations Density 0.008%