INDEX
Explanations
phrases or words related to power or influence
repeated mentions of the word "powerful."
New Auto-Interp
Negative Logits
chal
-0.86
forth
-0.84
eday
-0.83
cision
-0.77
ajor
-0.76
kay
-0.73
den
-0.72
©¶æ
-0.71
ey
-0.70
aked
-0.70
POSITIVE LOGITS
chords
0.96
enough
0.95
vested
0.87
chord
0.86
motiv
0.82
impulses
0.81
machinery
0.80
emotions
0.79
emotion
0.77
adversary
0.77
Activations Density 0.038%