INDEX
Explanations
words related to statistical or numerical disproportionality
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1328
+0.13
0.4%
1793
+0.12
0.4%
1861
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.13
0.02
1793
+0.12
0.02
1861
+0.12
0.02
Negative Logits
impractica
-0.92
unlaw
-0.91
impra
-0.89
inappro
-0.88
Rine
-0.87
reluct
-0.86
unve
-0.81
Vaugh
-0.80
Derivation
-0.80
Eft
-0.79
POSITIVE LOGITS
disproportion
1.01
proportion
0.93
proportion
0.87
propor
0.82
proportional
0.79
proportions
0.74
propor
0.73
agrí
0.71
proportionate
0.69
proportioned
0.65
Activations Density 0.065%