INDEX
Explanations
phrases related to community improvement and altruism
New Auto-Interp
Negative Logits
ordes
-0.16
CTS
-0.14
deniz
-0.14
itty
-0.14
eniz
-0.14
åł¡
-0.13
utos
-0.13
insky
-0.13
/mainwindow
-0.13
_busy
-0.13
POSITIVE LOGITS
benefit
0.40
common
0.39
greater
0.34
Benefit
0.33
common
0.31
Common
0.30
/common
0.30
good
0.28
greater
0.28
COMMON
0.28
Activations Density 0.086%