INDEX
Explanations
instances of the word "alone."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
282
+0.13
0.7%
87
+0.11
0.6%
172
+0.10
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
130
+0.13
0.02
282
+0.11
0.02
15
+0.10
0.02
Negative Logits
ĵ
-3.13
Ĥ
-3.13
IJ
-3.08
©
-3.07
Ĥ¬
-3.06
»¿
-2.96
ĻĤ
-2.93
ĸ
-2.87
Īĺ
-2.79
§
-2.73
POSITIVE LOGITS
requency
1.67
plic
1.66
antry
1.64
elve
1.60
olta
1.58
flix
1.56
evol
1.51
osto
1.51
Tube
1.47
igan
1.45
Activations Density 0.123%