INDEX
Explanations
the beginning of a text document
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.24
3.6%
1741
+0.06
0.9%
1870
+0.04
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
674
+0.24
0.05
108
+0.06
0.24
1713
+0.04
0.30
Negative Logits
unlaw
-1.82
belliger
-1.78
despotism
-1.75
unspeak
-1.71
ruinous
-1.67
disgra
-1.67
impractica
-1.57
odious
-1.52
nukes
-1.51
massacres
-1.49
POSITIVE LOGITS
<bos>
17.52
expandindo
2.89
GEBURTSDATUM
2.89
betweenstory
2.87
Administrativna
2.76
تقاوى
2.73
Autoritní
2.71
Italijani
2.59
Италијани
2.57
Мексичка
2.49
Activations Density 0.176%