INDEX
Explanations
the beginning or end of text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.45
3.0%
1741
+0.07
0.5%
104
+0.06
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1403
+0.45
0.21
800
+0.07
0.24
354
+0.06
0.24
Negative Logits
reluct
-4.55
impra
-4.37
increa
-4.30
indestru
-4.24
inev
-4.15
disagre
-4.11
disgra
-4.09
shenan
-4.03
disreg
-4.03
depic
-4.03
POSITIVE LOGITS
<bos>
14.05
betweenstory
2.48
GEBURTSDATUM
2.44
Autoritní
2.35
expandindo
2.33
Italijani
2.22
تقاوى
2.19
Paglinawan
2.15
Panamoan
2.12
Италијани
2.04
Activations Density 0.079%