INDEX
Explanations
phrases related to selfish behavior
references to selfishness and altruism
New Auto-Interp
Negative Logits
Downloadha
-0.88
ĸļ
-0.80
annis
-0.79
STD
-0.72
gran
-0.71
quart
-0.71
acea
-0.70
Ibid
-0.70
ORN
-0.68
Room
-0.68
POSITIVE LOGITS
selfish
1.11
istical
1.08
altru
0.99
minded
0.88
greed
0.88
motive
0.85
istic
0.83
rifice
0.82
motives
0.80
istically
0.79
Activations Density 0.012%