INDEX
Explanations
references to death or dying
New Auto-Interp
Negative Logits
weise
-0.18
.tt
-0.17
.GroupLayout
-0.15
bildung
-0.15
credible
-0.15
ial
-0.15
ạy
-0.15
adera
-0.14
desc
-0.14
mesinin
-0.14
POSITIVE LOGITS
Dead
0.24
dead
0.22
Dead
0.22
lier
0.22
.dead
0.21
ening
0.20
locks
0.19
DEAD
0.19
ened
0.19
dead
0.19
Activations Density 0.018%