INDEX
Explanations
concepts related to altruism and selflessness in human behavior
New Auto-Interp
Negative Logits
chez
-0.18
внÑĥÑĤÑĢи
-0.15
inside
-0.15
ÙĦØ·
-0.14
Within
-0.13
within
-0.13
ufe
-0.13
Inside
-0.13
Inch
-0.13
ниÑĤÑĮ
-0.13
POSITIVE LOGITS
in
0.68
Ïĥε
0.32
åľ¨
0.29
ÙģÙĬ
0.29
în
0.27
à¹ĥà¸Ļ
0.26
åľ¨
0.25
à¹ĥà¸Ļ
0.25
در
0.25
Âłin
0.24
Activations Density 0.815%