INDEX
Explanations
references to specific geographical locations, such as cities and regions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1909
+0.18
1.0%
1708
+0.17
0.9%
1271
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
690
+0.18
0.04
1328
+0.17
0.04
1343
+0.13
0.06
Negative Logits
<bos>
-1.11
OGND
-0.87
Đối
-0.79
Hozzáférés
-0.75
selaer
-0.69
حياته
-0.64
Phân
-0.64
حياتها
-0.64
MessageTagHelper
-0.63
htbp
-0.63
POSITIVE LOGITS
increa
1.59
inev
1.53
madonna
1.51
fta
1.51
ftu
1.49
affor
1.47
thut
1.47
emphat
1.46
coö
1.46
alre
1.43
Activations Density 0.521%