INDEX
Explanations
contractions with an apostrophe
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.11
0.3%
1839
+0.11
0.3%
1741
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
19
+0.11
0.06
1902
+0.11
0.06
478
+0.09
0.06
Negative Logits
julio
-0.50
ab
-0.49
Algemeen
-0.48
дописавши
-0.48
Vrij
-0.47
dis
-0.46
Terug
-0.46
sistem
-0.45
цездатний
-0.45
Stap
-0.45
POSITIVE LOGITS
parteci
0.96
purcha
0.96
tramonto
0.95
disagre
0.95
inev
0.93
Luglio
0.93
Giugno
0.90
madonna
0.90
excru
0.89
tremb
0.88
Activations Density 0.307%