INDEX
Explanations
the beginning of a document or text in a specific format
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.44
5.7%
1741
+0.07
0.8%
381
+0.04
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
15
+0.44
0.33
800
+0.07
0.34
354
+0.04
0.33
Negative Logits
unspeak
-2.82
reluct
-2.70
disgra
-2.63
unlaw
-2.62
shenan
-2.52
impractica
-2.50
impra
-2.48
disagre
-2.44
ineffec
-2.44
horrend
-2.43
POSITIVE LOGITS
<bos>
14.48
GEBURTSDATUM
2.54
expandindo
2.52
betweenstory
2.49
Autoritní
2.46
تقاوى
2.20
Italijani
2.17
Administrativna
2.16
Paglinawan
2.13
kasarigan
2.11
Activations Density 0.090%