INDEX
Explanations
the beginning of each new document
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.24
3.6%
1741
+0.06
0.8%
1870
+0.04
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
674
+0.24
0.05
108
+0.06
0.24
1713
+0.04
0.30
Negative Logits
belliger
-1.60
despotism
-1.60
unlaw
-1.49
ruinous
-1.49
massacres
-1.43
nukes
-1.38
Fascism
-1.38
unspeak
-1.38
odious
-1.35
demoral
-1.34
POSITIVE LOGITS
<bos>
17.44
expandindo
2.88
GEBURTSDATUM
2.87
betweenstory
2.84
Administrativna
2.73
تقاوى
2.70
Autoritní
2.69
Italijani
2.56
Италијани
2.54
Мексичка
2.46
Activations Density 0.176%