INDEX
Explanations
references to physical locations and spatial relationships
New Auto-Interp
Negative Logits
jer
-0.17
errer
-0.16
.annot
-0.15
mue
-0.14
iaux
-0.14
еÑģп
-0.14
ÑĢÑĥÑģ
-0.14
tual
-0.14
APPER
-0.14
659
-0.14
POSITIVE LOGITS
heads
0.30
-head
0.26
head
0.25
roof
0.24
heads
0.24
top
0.24
roofs
0.23
head
0.23
rooft
0.22
roof
0.22
Activations Density 0.101%