INDEX
Explanations
emotional expressions related to love and relationships
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
258
+0.22
1.3%
23
+0.22
1.3%
203
+0.18
1.1%
Correlated Neurons
Index
P. Corr.
Cos Sim.
258
+0.22
0.20
23
+0.22
0.16
189
+0.18
0.12
Negative Logits
tees
-2.11
ème
-1.90
unpublished
-1.70
ister
-1.65
book
-1.64
remarks
-1.61
bourg
-1.59
nikov
-1.58
aine
-1.58
reviewer
-1.57
POSITIVE LOGITS
They
1.75
Maybe
1.69
rap
1.68
while
1.58
_**
1.57
Especially
1.54
habit
1.51
mozilla
1.51
Holy
1.49
Sure
1.49
Activations Density 1.715%