INDEX
Explanations
mentions of different types of windows or window-related terms in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.23
1.3%
473
+0.11
0.7%
211
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
211
+0.23
0.02
148
+0.11
0.01
473
+0.11
0.02
Negative Logits
ĵ
-2.98
ĥ½
-2.95
<|outofrange|>
-2.70
-2.70
-2.70
↵ ↵
-2.70
↵
-2.70
↵
-2.70
-2.70
↵
-2.70
POSITIVE LOGITS
facing
1.85
viewed
1.70
faced
1.61
istor
1.60
$^
1.59
himself
1.58
sized
1.56
illa
1.54
height
1.52
ache
1.50
Activations Density 0.082%