INDEX
Explanations
punctuation marks at the end of sentences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.14
0.4%
674
+0.12
0.4%
1741
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1448
+0.14
0.05
961
+0.12
0.04
507
+0.11
0.04
Negative Logits
Óscar
-0.61
secara
-0.61
Compañ
-0.59
intéressante
-0.58
Darío
-0.57
Czym
-0.56
Junto
-0.56
Fø
-0.55
более
-0.55
Zapraszamy
-0.55
POSITIVE LOGITS
<bos>
1.40
wien
1.06
aquare
0.98
gmbh
0.98
kane
0.97
!...
0.95
levis
0.95
dimentic
0.94
jacques
0.94
waer
0.94
Activations Density 0.247%