INDEX
Explanations
references to death or lethal events
New Auto-Interp
Negative Logits
bÃło
-0.18
Trab
-0.15
Disorder
-0.14
олева
-0.14
HRESULT
-0.14
ÃŃme
-0.14
asurer
-0.14
lfw
-0.14
ÅĻes
-0.14
ancell
-0.13
POSITIVE LOGITS
die
0.65
died
0.59
dies
0.56
die
0.52
DIE
0.49
Die
0.48
_die
0.47
Die
0.46
dying
0.45
Dies
0.41
Activations Density 0.213%