INDEX
Explanations
phrases related to moral judgements, particularly the concept of evil
instances of the word "evil" and its associated contexts
New Auto-Interp
Negative Logits
ribe
-0.78
Lago
-0.78
PsyNetMessage
-0.77
ribes
-0.76
illation
-0.75
UNCH
-0.73
lov
-0.71
aro
-0.71
RESULTS
-0.71
drops
-0.70
POSITIVE LOGITS
incarn
1.04
evil
0.96
mastermind
0.87
enemy
0.87
villain
0.86
genius
0.86
twin
0.85
undermin
0.85
adversary
0.84
deed
0.83
Activations Density 0.013%