INDEX
Explanations
words related to gratitude and support
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
946
+0.09
0.3%
1129
+0.09
0.2%
1600
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1600
+0.09
0.04
1129
+0.09
0.05
1252
+0.08
0.03
Negative Logits
squa
-1.05
disagre
-1.04
glau
-1.04
affor
-1.03
haup
-1.03
oleo
-1.02
leonardo
-1.01
effe
-1.01
maer
-1.00
inev
-1.00
POSITIVE LOGITS
support
0.81
ISupport
0.80
supportive
0.76
support
0.74
Support
0.70
solidarity
0.70
upport
0.67
Support
0.67
supports
0.65
donations
0.65
Activations Density 0.335%