INDEX
Explanations
words related to power or influence
references to the concept of influence
New Auto-Interp
Negative Logits
ft
-0.75
TAG
-0.74
leigh
-0.73
fal
-0.73
DEF
-0.72
ITIES
-0.72
ãĤ©
-0.71
asar
-0.71
ears
-0.70
eared
-0.70
POSITIVE LOGITS
pedd
1.18
shaping
0.99
cooker
0.98
influencing
0.97
exerted
0.91
influence
0.84
sway
0.79
influences
0.77
influ
0.77
influenced
0.72
Activations Density 0.058%