INDEX
Explanations
words related to evil or negative characteristics
references to the concept of "evil."
New Auto-Interp
Negative Logits
akeru
-0.78
dropping
-0.77
eding
-0.76
GN
-0.75
PsyNetMessage
-0.74
UNCH
-0.73
ribe
-0.73
kees
-0.73
amen
-0.72
aro
-0.72
POSITIVE LOGITS
incarn
1.03
deeds
0.94
deed
0.90
nesses
0.89
mastermind
0.87
twin
0.83
genius
0.81
lord
0.77
evil
0.77
NESS
0.77
Activations Density 0.028%