INDEX
Explanations
references to specific metrics or indicators, often in a table format
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.17
0.9%
53
+0.12
0.7%
81
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
53
+0.17
0.02
401
+0.12
0.01
340
+0.11
0.02
Negative Logits
ies
-1.72
heads
-1.67
iles
-1.60
és
-1.55
ids
-1.53
astics
-1.52
edly
-1.51
oning
-1.51
works
-1.50
anes
-1.50
POSITIVE LOGITS
leans
1.54
grep
1.45
jour
1.41
dated
1.40
prescribe
1.39
date
1.36
inel
1.35
agra
1.35
Wire
1.35
½
1.33
Activations Density 0.062%