INDEX
Explanations
quantitative assessments of interventions or conditions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.26
1.5%
30
+0.13
0.7%
443
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
443
+0.26
0.02
30
+0.13
0.03
25
+0.12
0.04
Negative Logits
unction
-1.54
assumption
-1.49
]=
-1.44
ubation
-1.38
TE
-1.37
conjugation
-1.36
ÃŁe
-1.35
unnumbered
-1.33
ncia
-1.33
)\|_{-1.33
POSITIVE LOGITS
shooter
1.67
featured
1.55
fireplace
1.53
magazines
1.52
strikes
1.50
istically
1.50
native
1.47
paradise
1.44
poster
1.44
!(
1.43
Activations Density 0.354%