INDEX
Explanations
organizations and facilities that are being referred to in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
184
+0.15
0.4%
1150
+0.11
0.3%
513
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1438
+0.15
0.04
513
+0.11
0.03
324
+0.10
0.03
Negative Logits
secon
-2.27
wien
-2.25
mef
-2.20
effe
-2.20
fte
-2.18
aen
-2.17
fta
-2.13
nece
-2.13
„,
-2.12
lein
-2.08
POSITIVE LOGITS
itself
1.07
’
0.92
'
0.92
=""/>
0.77
本身
0.76
tersebut
0.71
Thiết
0.69
하십시오
0.68
itse
0.67
]=="
0.66
Activations Density 0.173%