INDEX
Explanations
references to evil or negativity
references to the concept of "evil."
New Auto-Interp
Negative Logits
ribe
-0.75
UNCH
-0.74
GN
-0.71
Lago
-0.70
ainers
-0.70
PsyNetMessage
-0.70
ribes
-0.70
cially
-0.69
RESULTS
-0.69
aro
-0.67
POSITIVE LOGITS
incarn
1.01
deed
0.89
undermin
0.89
deeds
0.89
genius
0.87
evil
0.86
mastermind
0.86
nesses
0.83
lord
0.83
twin
0.83
Activations Density 0.015%