INDEX
Explanations
terms related to a direction, either physical or metaphorical
references to decisions and the consequences of choices
New Auto-Interp
Negative Logits
nces
-0.76
rities
-0.73
abilia
-0.72
iencies
-0.71
chens
-0.70
nice
-0.70
unts
-0.70
auna
-0.70
ombies
-0.68
iuses
-0.67
POSITIVE LOGITS
toward
1.13
towards
1.09
tread
0.97
direction
0.95
directions
0.92
paths
0.89
footsteps
0.88
trajectory
0.86
Directions
0.86
downward
0.82
Activations Density 0.355%