INDEX
Explanations
terms related to manipulation or manipulative actions
terms related to manipulation and control
New Auto-Interp
Negative Logits
çĦ
-0.82
hood
-0.77
ness
-0.73
Republic
-0.71
zl
-0.70
fighter
-0.68
gap
-0.68
Blessed
-0.67
Ath
-0.65
senal
-0.65
POSITIVE LOGITS
manipulate
0.98
manip
0.97
manipulating
0.89
eering
0.87
manipulated
0.87
manipulation
0.86
glers
0.81
levers
0.81
ulators
0.80
ulations
0.78
Activations Density 0.021%