INDEX
Explanations
phrases indicating possession or relation between entities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
99
+0.14
0.8%
450
+0.12
0.7%
58
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
99
+0.14
0.02
58
+0.12
0.02
450
+0.11
0.02
Negative Logits
»¿
-2.28
¬
-2.27
«
-2.18
Īĺ
-2.11
Ĺ
-2.02
ĭ
-1.96
ij
-1.95
ł
-1.94
Ĥ
-1.92
¤
-1.86
POSITIVE LOGITS
enstein
1.89
minist
1.72
grounds
1.63
noon
1.56
ovir
1.56
ball
1.55
stock
1.51
oso
1.50
ivir
1.48
older
1.45
Activations Density 0.063%