INDEX
Explanations
political entities and officials
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1741
+0.15
0.5%
1445
+0.15
0.5%
382
+0.14
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1445
+0.15
0.05
1538
+0.15
0.05
382
+0.14
0.04
Negative Logits
maroc
-1.18
milano
-1.18
fluo
-1.12
igno
-1.07
nutr
-1.06
italia
-1.04
ù
-1.04
tranquillo
-1.02
istr
-1.01
erec
-1.00
POSITIVE LOGITS
Ekster
0.84
Literat
0.75
Pró
0.72
Izvori
0.72
Fontes
0.67
Atsauces
0.67
Alguns
0.66
meanwhile
0.66
Pode
0.65
unspeak
0.64
Activations Density 0.138%