INDEX
Explanations
mentions of specific locations or proper nouns
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.25
0.8%
1842
+0.14
0.5%
1343
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
50
+0.25
0.05
16
+0.14
0.07
752
+0.13
0.04
Negative Logits
ностран
-0.61
<bos>
-0.58
:""
-0.57
спользова
-0.57
desertcart
-0.55
ISPR
-0.55
دیکھیے
-0.55
>);
-0.55
ATEGY
-0.54
AILABILITY
-0.54
POSITIVE LOGITS
aen
1.13
fta
1.11
fte
1.10
intrigu
1.08
accla
1.07
stockholm
1.06
miu
1.06
franz
1.04
illi
1.04
emphat
1.04
Activations Density 0.287%