INDEX
Explanations
words related to physical objects and their locations in a room
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
544
+0.15
0.6%
1839
+0.12
0.4%
1187
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
544
+0.15
0.02
1141
+0.12
0.02
1137
+0.12
0.02
Negative Logits
ôtel
-0.57
تضيفلها
-0.52
OGND
-0.50
thè
-0.50
uminazione
-0.50
ften
-0.47
Atenas
-0.46
ventude
-0.46
躇
-0.46
vernac
-0.46
POSITIVE LOGITS
mirror
1.36
Mirror
1.30
mirrors
1.28
mirror
1.20
Mirror
1.18
Mirrors
1.15
mirroring
1.09
Mirrors
1.08
MIRROR
1.07
mirrored
1.04
Activations Density 0.068%