INDEX
Explanations
concepts related to cooperation and altruism in social interactions
New Auto-Interp
Negative Logits
istrovstvÃŃ
-0.15
ALA
-0.15
eri
-0.14
panies
-0.14
TION
-0.14
ottle
-0.14
GENERIC
-0.14
Clips
-0.14
_PROC
-0.14
ÑĤов
-0.13
POSITIVE LOGITS
ouch
0.18
OSP
0.16
Cah
0.15
Gi
0.14
court
0.14
arent
0.13
rica
0.13
Storm
0.13
HS
0.13
cb
0.13
Activations Density 0.048%