INDEX
Explanations
locations or events in various categories such as books, magazines, vehicles, and political issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.24
3.6%
1741
+0.06
0.9%
1870
+0.04
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
674
+0.24
0.05
108
+0.06
0.24
1713
+0.04
0.30
Negative Logits
belliger
-1.68
despotism
-1.66
unlaw
-1.64
ruinous
-1.57
unspeak
-1.53
disgra
-1.48
massacres
-1.46
nukes
-1.44
odious
-1.43
Fascism
-1.42
POSITIVE LOGITS
<bos>
17.47
expandindo
2.88
GEBURTSDATUM
2.88
betweenstory
2.85
Administrativna
2.75
تقاوى
2.71
Autoritní
2.70
Italijani
2.57
Италијани
2.55
Мексичка
2.47
Activations Density 0.176%