INDEX
Explanations
concepts of change and movement in various contexts
New Auto-Interp
Negative Logits
upward
-0.15
atica
-0.14
outr
-0.14
ucht
-0.14
luv
-0.13
onn
-0.13
mina
-0.13
roke
-0.13
irl
-0.13
aida
-0.13
POSITIVE LOGITS
towards
0.64
toward
0.63
away
0.56
Towards
0.47
Towards
0.46
Away
0.44
Tow
0.43
away
0.42
hacia
0.40
Away
0.38
Activations Density 0.096%