INDEX
Explanations
user interface terms and elements related to UI design
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
12
+0.13
0.7%
122
+0.12
0.7%
24
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
122
+0.13
0.07
188
+0.12
0.01
78
+0.12
0.04
Negative Logits
Ĺ
-2.52
©
-2.46
¤
-2.34
ľĵ
-2.33
§
-2.27
Ī
-2.27
ļ
-2.24
´
-2.24
Ļ
-2.19
ĸ
-2.17
POSITIVE LOGITS
Contest
1.43
Armed
1.37
Encyclopedia
1.36
]"
1.36
rea
1.32
]:
1.30
Problem
1.28
anthem
1.28
]."
1.26
conspir
1.25
Activations Density 3.875%