INDEX
Explanations
the beginning of a sentence
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.41
3.0%
1741
+0.06
0.5%
381
+0.06
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1690
+0.41
0.20
354
+0.06
0.19
800
+0.06
0.19
Negative Logits
reluct
-4.42
disagre
-4.18
increa
-4.16
unspeak
-4.11
shenan
-4.11
impra
-4.10
inev
-4.08
disgra
-4.08
indestru
-4.03
inconce
-3.94
POSITIVE LOGITS
<bos>
14.31
GEBURTSDATUM
2.31
betweenstory
2.30
expandindo
2.25
Autoritní
2.23
تقاوى
2.22
Administrativna
2.08
Italijani
2.08
Panamoan
2.01
Paglinawan
1.98
Activations Density 0.041%