INDEX
Explanations
the beginning and end of the document as important markers
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.22
4.5%
1741
+0.04
0.8%
2019
+0.03
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
674
+0.22
0.06
1523
+0.04
0.20
108
+0.03
0.22
Negative Logits
massacres
-1.07
disinformation
-0.95
traitors
-0.94
Fascism
-0.94
atrocities
-0.93
bombings
-0.92
despotism
-0.92
fascism
-0.92
blasphemy
-0.91
mismanagement
-0.90
POSITIVE LOGITS
<bos>
16.79
expandindo
2.75
GEBURTSDATUM
2.73
betweenstory
2.62
Administrativna
2.52
تقاوى
2.52
Autoritní
2.50
dispen
2.39
Италијани
2.32
Italijani
2.28
Activations Density 0.175%