INDEX
Explanations
text describing details and characteristics, potentially related to planning or visualization
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.19
0.6%
184
+0.15
0.5%
1343
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.19
0.02
394
+0.15
0.04
1842
+0.13
0.04
Negative Logits
affor
-2.07
encomp
-2.06
reluct
-1.99
philanth
-1.97
accla
-1.96
impra
-1.96
increa
-1.95
depic
-1.94
embra
-1.94
strick
-1.93
POSITIVE LOGITS
etc
0.69
GraphicsUnit
0.68
different
0.68
techniques
0.67
patterns
0.67
relationships
0.64
sizes
0.64
ഇ
0.63
variables
0.63
từng
0.63
Activations Density 0.338%