INDEX
Explanations
statements indicating the nature of comparisons or evaluations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
431
+0.12
0.7%
448
+0.11
0.6%
157
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
157
+0.12
0.04
356
+0.11
0.02
244
+0.11
0.03
Negative Logits
vigil
-1.65
aring
-1.56
liberty
-1.40
dedicated
-1.39
alive
-1.38
inde
-1.37
courts
-1.36
discovers
-1.34
den
-1.34
usal
-1.33
POSITIVE LOGITS
ORT
1.42
pony
1.41
pairing
1.38
aceae
1.35
aside
1.34
Ïī
1.31
correlate
1.31
vocals
1.31
åĬĽ
1.31
oS
1.31
Activations Density 0.222%