INDEX
Explanations
expressions related to negative traits or behaviors, particularly selfishness
concepts and discussions surrounding selfishness and altruism
New Auto-Interp
Negative Logits
Downloadha
-1.05
BuyableInstoreAndOnline
-0.77
annis
-0.75
gran
-0.74
UN
-0.69
AUT
-0.68
ORN
-0.68
enegger
-0.66
Room
-0.66
ANN
-0.64
POSITIVE LOGITS
istical
1.07
istic
1.06
minded
0.97
istically
0.92
altru
0.88
selfish
0.86
motive
0.84
rifice
0.84
ly
0.83
bies
0.81
Activations Density 0.017%