INDEX
Explanations
names of individuals or official positions in a written piece
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.32
1.1%
1177
+0.18
0.6%
394
+0.16
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1177
+0.32
0.09
227
+0.18
0.15
981
+0.16
0.15
Negative Logits
<bos>
-1.01
solve
-0.55
zdar
-0.54
zatem
-0.54
give
-0.54
read
-0.53
trzeba
-0.52
oznacza
-0.52
spoko
-0.52
قديم
-0.52
POSITIVE LOGITS
Juf
1.63
alkoh
1.47
antik
1.42
Khart
1.42
Keny
1.41
Bartholo
1.40
optik
1.40
Abbé
1.39
Cfr
1.39
Sarm
1.38
Activations Density 2.269%