INDEX
Explanations
the beginning of a text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.45
5.7%
1741
+0.07
0.8%
381
+0.04
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
15
+0.45
0.33
800
+0.07
0.34
1403
+0.04
0.29
Negative Logits
unspeak
-2.77
reluct
-2.64
disgra
-2.58
unlaw
-2.58
shenan
-2.47
impractica
-2.46
impra
-2.43
ineffec
-2.40
disagre
-2.39
horrend
-2.38
POSITIVE LOGITS
<bos>
14.47
GEBURTSDATUM
2.53
expandindo
2.51
betweenstory
2.48
Autoritní
2.46
تقاوى
2.20
Italijani
2.16
Administrativna
2.15
Paglinawan
2.12
kasarigan
2.10
Activations Density 0.090%