INDEX
Explanations
emotions or interactions involving physical touch and expressions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.15
0.4%
906
+0.13
0.4%
674
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
736
+0.15
0.05
850
+0.13
0.04
906
+0.13
0.00
Negative Logits
fuf
-2.05
accla
-2.04
strick
-2.00
embra
-2.00
reluct
-2.00
inev
-1.97
increa
-1.96
emphat
-1.94
guarante
-1.93
affor
-1.90
POSITIVE LOGITS
<bos>
0.68
.
0.64
<eos>
0.64
invokeLater
0.61
again
0.61
єра
0.60
stereotype
0.60
under
0.59
bune
0.59
евна
0.59
Activations Density 0.204%