INDEX
Explanations
phrases that contain numbers or measurements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.25
0.8%
1535
+0.20
0.6%
1699
+0.16
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.25
0.11
1535
+0.20
0.09
1775
+0.16
0.08
Negative Logits
repug
-1.07
disagre
-0.99
suspic
-0.99
viciss
-0.99
Leurs
-0.98
pamph
-0.97
unwarran
-0.95
rodriguez
-0.94
Souha
-0.93
practition
-0.92
POSITIVE LOGITS
These
0.74
Lastly
0.72
Finally
0.71
.
0.69
↵↵
0.68
Both
0.68
This
0.66
Additionally
0.66
ConverterFactory
0.65
}.
0.63
Activations Density 0.509%