INDEX
Explanations
references to mice and related experimental conditions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.13
0.7%
59
+0.12
0.7%
401
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
401
+0.13
0.03
344
+0.12
0.03
118
+0.12
0.02
Negative Logits
Ļ
-2.92
ı
-2.87
Į
-2.73
ĸ´
-2.70
Ľ
-2.67
ħ
-2.65
»
-2.64
¨
-2.63
ĻĤ
-2.61
Ħ
-2.58
POSITIVE LOGITS
endif
1.70
</
1.55
truth
1.40
&\
1.36
tactic
1.36
Lie
1.36
intuition
1.34
Algebra
1.31
commut
1.30
======
1.29
Activations Density 0.202%