INDEX
Explanations
items received as gifts and expressions of enthusiasm towards them
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.10
0.3%
519
+0.10
0.3%
1129
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1953
+0.10
0.03
1062
+0.10
0.04
581
+0.08
0.03
Negative Logits
milf
-1.59
hairc
-1.49
increa
-1.47
thut
-1.40
fta
-1.37
?...
-1.36
ugg
-1.36
strick
-1.36
!...
-1.35
wherea
-1.34
POSITIVE LOGITS
gift
0.94
gifts
0.92
🎁
0.80
<bos>
0.75
Gift
0.74
gift
0.74
Gifts
0.69
Christmas
0.65
Gift
0.65
gif
0.65
Activations Density 0.419%