INDEX
Explanations
phrases related to teamwork and cooperation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2030
+0.08
0.2%
1040
+0.07
0.2%
872
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2030
+0.08
0.05
1176
+0.07
0.04
1746
+0.07
0.04
Negative Logits
fte
-1.62
fta
-1.60
„,
-1.55
lts
-1.52
fup
-1.51
effe
-1.51
sii
-1.51
aen
-1.50
nece
-1.50
mef
-1.50
POSITIVE LOGITS
to
0.89
ที่จะ
0.74
να
0.70
if
0.65
for
0.62
that
0.60
чтобы
0.59
để
0.58
า
0.58
щоб
0.58
Activations Density 0.293%