INDEX
Explanations
words related to limitations or boundaries
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1482
+0.17
0.7%
130
+0.16
0.7%
1271
+0.13
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
130
+0.17
0.04
1482
+0.16
0.03
395
+0.13
0.03
Negative Logits
لينكات
-0.57
ValueGeneration
-0.52
URERS
-0.50
ANTAGES
-0.49
leyeb
-0.49
disambiguazione
-0.48
ensement
-0.48
URATION
-0.47
providedIn
-0.46
fået
-0.46
POSITIVE LOGITS
li
1.09
Li
1.07
Li
1.03
LI
0.96
quoique
0.94
Lith
0.92
LIP
0.89
Lilli
0.85
Lia
0.82
Lio
0.82
Activations Density 0.145%