INDEX
Explanations
verbs and phrases related to guiding or directing actions
New Auto-Interp
Negative Logits
Corpus
-0.76
ropolitan
-0.75
ylon
-0.74
enegger
-0.71
upon
-0.68
è¦ļéĨĴ
-0.67
ITNESS
-0.66
ocalyptic
-0.65
Leban
-0.64
Ming
-0.64
POSITIVE LOGITS
toward
1.02
towards
0.97
steer
0.95
steered
0.95
wheel
0.94
clear
0.92
away
0.87
downwards
0.83
wheel
0.78
steering
0.78
Activations Density 0.007%