INDEX
Explanations
references to events or categories related to death or mortality
New Auto-Interp
Negative Logits
,
-1.04
in
-0.98
et
-0.97
to
-0.96
-
-0.92
y
-0.90
se
-0.90
de
-0.89
di
-0.89
ha
-0.88
POSITIVE LOGITS
itſelf
2.25
ſelves
2.05
myſelf
2.01
Jefus
1.99
ſelf
1.98
pleaſure
1.98
Monfieur
1.96
Reſ
1.93
raiſ
1.92
Anſ
1.91
Activations Density 0.512%