INDEX
Explanations
words associated with directional terms, particularly focused on "left" and "right."
New Auto-Interp
Negative Logits
hir
-0.17
ungle
-0.14
oir
-0.14
edo
-0.14
:animated
-0.14
åĨµ
-0.14
MainAxisAlignment
-0.14
idia
-0.14
817
-0.14
ombat
-0.14
POSITIVE LOGITS
y
0.21
ies
0.16
yg
0.15
ye
0.15
anka
0.15
J
0.15
yx
0.15
G
0.14
agn
0.14
åŃĹ
0.14
Activations Density 0.016%