INDEX
Explanations
references to human nature and moral depravity
New Auto-Interp
Negative Logits
लब
-0.18
.updateDynamic
-0.17
ptime
-0.17
cheon
-0.17
nze
-0.16
istrovstvÃŃ
-0.16
bla
-0.16
\brief
-0.16
_PT
-0.15
ukan
-0.15
POSITIVE LOGITS
fallen
0.20
Adam
0.19
since
0.18
Fallen
0.18
nature
0.18
0.18
corruption
0.17
Seed
0.17
ven
0.17
fell
0.16
Activations Density 0.046%