INDEX
Explanations
phrases related to moral and self-interest conflicts
New Auto-Interp
Negative Logits
出版年
-0.71
ujednoznacz
-0.63
devamını
-0.57
tempfile
-0.52
ılığı
-0.51
Fordítás
-0.50
Eres
-0.48
lipop
-0.48
bello
-0.47
Kjelder
-0.47
POSITIVE LOGITS
interests
1.60
Interests
1.43
interest
1.29
interests
1.29
intereses
1.22
intérêts
1.20
Interests
1.19
Interest
1.18
INTEREST
1.16
Interest
1.11
Activations Density 0.353%