INDEX
Explanations
descriptions of tasks or activities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
856
+0.10
0.3%
776
+0.10
0.3%
1485
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
856
+0.10
0.04
776
+0.10
0.05
1207
+0.08
0.04
Negative Logits
gaily
-0.92
plenti
-0.90
vainly
-0.86
beaute
-0.85
unspeak
-0.84
wherea
-0.83
tolerably
-0.82
shenan
-0.81
Augu
-0.78
depic
-0.77
POSITIVE LOGITS
yogur
0.62
kür
0.59
tuong
0.57
ideolog
0.57
nhanh
0.56
funghi
0.56
capulco
0.55
quick
0.55
breve
0.55
months
0.55
Activations Density 0.224%