INDEX
Explanations
mentions of physical locations, particularly cities and landmarks
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
382
+0.22
0.7%
1741
+0.16
0.5%
2034
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.22
0.10
1265
+0.16
0.06
736
+0.12
0.07
Negative Logits
meras
-0.98
marte
-0.95
alkoh
-0.95
praktik
-0.95
balon
-0.94
parati
-0.94
medes
-0.93
sement
-0.93
ideolog
-0.92
kosme
-0.92
POSITIVE LOGITS
followed
0.75
plus
0.70
以及
0.70
그리고
0.69
and
0.69
which
0.67
along
0.64
oraz
0.62
coupled
0.62
sowie
0.62
Activations Density 0.412%