INDEX
Explanations
phrases related to exerting influence or control over others
New Auto-Interp
Negative Logits
884
-0.17
íĻĺ
-0.17
chant
-0.17
etch
-0.17
erner
-0.15
916
-0.15
ills
-0.15
/story
-0.15
vore
-0.14
azi
-0.14
POSITIVE LOGITS
pull
0.41
pulling
0.36
pull
0.36
Pull
0.35
pulls
0.34
Pull
0.34
pulled
0.33
.Pull
0.29
.pull
0.27
_pull
0.27
Activations Density 0.038%