INDEX
Explanations
expressions of physical affection, particularly hugging
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1823
+0.10
0.3%
1899
+0.09
0.2%
1533
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1823
+0.10
0.03
509
+0.09
0.04
736
+0.09
0.04
Negative Logits
kesi
-0.61
jaya
-0.60
saha
-0.60
istan
-0.58
maksi
-0.56
vider
-0.56
felipe
-0.56
rodrigo
-0.54
alberto
-0.54
kasa
-0.54
POSITIVE LOGITS
hug
0.66
<bos>
0.63
hugged
0.62
hugs
0.61
hugging
0.57
queeze
0.55
comforting
0.55
hug
0.52
embrace
0.51
Hug
0.50
Activations Density 0.193%