INDEX
Explanations
locations and dates in news articles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
304
+0.11
0.3%
906
+0.10
0.3%
1288
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
648
+0.11
0.04
1714
+0.10
0.04
1267
+0.08
0.04
Negative Logits
Wię
-0.87
Bardzo
-0.84
Może
-0.81
Și
-0.81
sólo
-0.79
Przyp
-0.77
Dziękuję
-0.77
Obrigado
-0.76
Jako
-0.75
Więcej
-0.75
POSITIVE LOGITS
gend
1.32
bett
1.30
mef
1.29
exem
1.29
profi
1.29
wien
1.29
inder
1.27
laun
1.24
embra
1.24
daf
1.24
Activations Density 0.189%