INDEX
Explanations
the beginning or end of a text document
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.27
5.0%
1741
+0.05
1.0%
50
+0.05
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
674
+0.27
0.12
1713
+0.05
0.75
108
+0.05
0.58
Negative Logits
massacres
-1.13
disinformation
-1.01
atrocities
-0.98
traitors
-0.96
fascism
-0.96
Fascism
-0.96
bloodshed
-0.95
mismanagement
-0.95
despotism
-0.94
bombings
-0.94
POSITIVE LOGITS
<bos>
17.06
expandindo
2.83
GEBURTSDATUM
2.80
betweenstory
2.71
Administrativna
2.61
Autoritní
2.59
تقاوى
2.58
Италијани
2.41
Italijani
2.40
Мексичка
2.34
Activations Density 0.958%