INDEX
Explanations
phrases related to separations or exceptions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1557
+0.14
0.5%
795
+0.12
0.4%
1306
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1557
+0.14
0.02
757
+0.12
0.02
890
+0.11
0.02
Negative Logits
Thomas
-0.50
Ver
-0.49
Ve
-0.48
Lu
-0.47
Thomas
-0.46
Lu
-0.46
Coc
-0.46
Na
-0.46
ver
-0.46
Ver
-0.45
POSITIVE LOGITS
fhort
1.02
madonna
1.01
ftre
1.01
paillettes
1.00
poff
0.99
bordeaux
0.96
ecru
0.96
chèvre
0.96
outlander
0.96
difp
0.95
Activations Density 0.085%