INDEX
Explanations
names of places or names of people
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1896
+0.15
0.6%
381
+0.15
0.6%
1177
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.15
0.07
1896
+0.15
0.05
144
+0.14
0.05
Negative Logits
<bos>
-1.30
그것
-0.79
나는
-0.73
for
-0.71
just
-0.71
in
-0.70
자신의
-0.69
at
-0.69
책
-0.68
within
-0.68
POSITIVE LOGITS
embra
1.81
effe
1.80
dispen
1.80
alkoh
1.76
abnorm
1.72
pessi
1.71
bett
1.69
kram
1.69
simplif
1.68
wien
1.68
Activations Density 0.297%