INDEX
Explanations
phrases related to directions or trajectories
concepts related to direction or guidance
New Auto-Interp
Negative Logits
enty
-0.75
roma
-0.73
athered
-0.73
nikov
-0.72
esters
-0.68
reditary
-0.68
ammy
-0.67
bley
-0.67
akov
-0.66
unker
-0.66
POSITIVE LOGITS
direction
1.18
ality
1.17
directions
1.11
towards
0.99
toward
0.99
finder
0.94
finding
0.92
ally
0.87
posts
0.84
eering
0.84
Activations Density 0.046%