INDEX
Explanations
the beginning of sentences or paragraphs
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.53
6.3%
1842
+0.06
0.7%
1741
+0.05
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
674
+0.53
0.24
108
+0.06
0.30
1523
+0.05
0.24
Negative Logits
reluct
-2.60
unlaw
-2.51
disgra
-2.46
impra
-2.39
disagre
-2.35
unspeak
-2.35
impractica
-2.32
increa
-2.26
Juf
-2.24
unwarran
-2.18
POSITIVE LOGITS
<bos>
13.29
Autoritní
2.21
Administrativna
2.15
betweenstory
2.01
expandindo
2.00
تقاوى
2.00
GEBURTSDATUM
1.95
Italijani
1.90
Мексичка
1.88
kasarigan
1.80
Activations Density 0.910%