INDEX
Explanations
concepts related to self-sacrifice and the willingness to endure hardships for others
New Auto-Interp
Negative Logits
alach
-0.17
icari
-0.15
roud
-0.15
/vendor
-0.15
Handled
-0.14
leigh
-0.14
cong
-0.14
_Default
-0.14
laz
-0.14
yper
-0.14
POSITIVE LOGITS
sacrifice
0.63
sacrifices
0.55
sacr
0.54
Sacr
0.54
sacrificing
0.49
sacrific
0.48
sacrificed
0.47
SAC
0.40
sac
0.38
acr
0.35
Activations Density 0.051%