INDEX
Explanations
phrases that describe spatial relationships and positioning
New Auto-Interp
Negative Logits
Depth
-0.17
eneral
-0.15
inium
-0.15
voy
-0.14
ighth
-0.14
.connection
-0.14
ħn
-0.14
uctions
-0.14
acades
-0.14
depth
-0.13
POSITIVE LOGITS
front
0.39
front
0.34
FRONT
0.30
-front
0.29
_front
0.24
Front
0.23
fron
0.23
.front
0.21
fronts
0.21
Front
0.21
Activations Density 0.184%