INDEX
Explanations
phrases related to sacrifice and selflessness
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1171
+0.09
0.2%
297
+0.09
0.2%
1876
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1171
+0.09
0.05
1876
+0.09
0.03
1298
+0.07
0.03
Negative Logits
maneu
-0.85
mef
-0.85
accla
-0.76
levis
-0.76
jati
-0.74
nephe
-0.73
joo
-0.71
saba
-0.70
erik
-0.70
fortn
-0.70
POSITIVE LOGITS
sacrifice
0.78
sacrificing
0.72
selfless
0.72
sacrifices
0.71
sacrificed
0.68
sacrific
0.67
sacrifice
0.64
empio
0.62
sacrific
0.58
voluntarily
0.57
Activations Density 0.339%