INDEX
Explanations
sentences expressing feelings related to body image and self-reflection
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
906
+0.12
0.3%
658
+0.10
0.3%
1919
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.12
0.06
658
+0.10
0.07
1446
+0.09
0.03
Negative Logits
jorge
-1.33
sergio
-1.27
shenan
-1.26
roberto
-1.24
unspeak
-1.23
reluct
-1.23
prétend
-1.20
alberto
-1.20
ineffec
-1.20
increa
-1.20
POSITIVE LOGITS
excited
0.79
feeling
0.75
worry
0.72
disappointed
0.70
worried
0.70
feel
0.68
feelings
0.68
surprised
0.67
concern
0.67
excitement
0.67
Activations Density 0.507%