INDEX
Explanations
the beginning of an article or text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.47
4.7%
1741
+0.07
0.7%
381
+0.05
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
15
+0.47
0.29
354
+0.07
0.29
800
+0.05
0.30
Negative Logits
reluct
-4.62
disagre
-4.33
unspeak
-4.31
shenan
-4.29
impra
-4.28
increa
-4.26
indestru
-4.19
disgra
-4.17
inev
-4.12
unlaw
-4.10
POSITIVE LOGITS
<bos>
14.64
betweenstory
2.43
GEBURTSDATUM
2.39
expandindo
2.37
Autoritní
2.37
تقاوى
2.21
Paglinawan
2.18
Italijani
2.17
Administrativna
2.15
Panamoan
2.09
Activations Density 0.090%