INDEX
Explanations
conditional statements, particularly those involving "if" and "else"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.15
0.8%
244
+0.14
0.8%
369
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
206
+0.15
0.04
383
+0.14
0.05
411
+0.14
0.04
Negative Logits
fully
-1.53
?”
-1.52
References
-1.37
ERN
-1.35
aurus
-1.35
VEN
-1.35
ERV
-1.34
orate
-1.31
å¦Ĥ
-1.28
anic
-1.27
POSITIVE LOGITS
(!(
1.71
(!
1.62
INFO
1.45
circumstances
1.45
(!
1.39
conditions
1.38
([
1.37
(((
1.36
ask
1.34
acic
1.34
Activations Density 0.146%