INDEX
Explanations
a list of items or rankings
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.29
0.9%
1343
+0.14
0.4%
1699
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2019
+0.29
0.05
359
+0.14
0.04
1678
+0.12
0.04
Negative Logits
Zwar
-0.69
umożli
-0.65
Lmfao
-0.65
Uniwers
-0.60
cze
-0.59
Leurs
-0.59
dostar
-0.58
Gleichzeitig
-0.57
didel
-0.57
Pourtant
-0.57
POSITIVE LOGITS
circon
0.94
ordina
0.87
dispen
0.87
oner
0.86
dissi
0.86
embra
0.85
istan
0.84
Kategor
0.83
ciment
0.83
allarg
0.81
Activations Density 0.119%