INDEX
Explanations
phrases indicating proximity or orientation
New Auto-Interp
Negative Logits
ĴĪ
-0.20
brains
-0.16
ÃŃd
-0.15
691
-0.15
holes
-0.14
aju
-0.14
zones
-0.14
ÃŃl
-0.14
ÏģοÏį
-0.14
oyo
-0.14
POSITIVE LOGITS
corner
0.57
corner
0.50
Corner
0.45
Corner
0.44
-corner
0.43
corners
0.38
bend
0.37
Bend
0.31
_corner
0.27
comer
0.26
Activations Density 0.013%