INDEX
Explanations
phrases related to teamwork and collaboration
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
490
+0.08
0.2%
658
+0.07
0.2%
2033
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1737
+0.08
0.04
1859
+0.07
0.03
658
+0.07
0.05
Negative Logits
dises
-1.33
fta
-1.27
oner
-1.27
effe
-1.26
mef
-1.21
increa
-1.19
inev
-1.17
squa
-1.17
desir
-1.16
fuf
-1.15
POSITIVE LOGITS
prostu
0.64
simply
0.63
always
0.59
resort
0.57
immediately
0.57
instantly
0.57
recourse
0.56
don
0.56
automatically
0.55
usually
0.55
Activations Density 0.346%