INDEX
Explanations
phrases that emphasize the concept of helping or serving others
New Auto-Interp
Negative Logits
asse
-0.16
WebResponse
-0.15
slu
-0.15
شد
-0.14
via
-0.14
pcl
-0.14
ĨĴ
-0.14
PEC
-0.14
orna
-0.14
Stuff
-0.13
POSITIVE LOGITS
favors
0.42
fav
0.36
favor
0.34
favour
0.32
Favor
0.31
justice
0.29
harm
0.26
Fav
0.23
fav
0.23
favor
0.23
Activations Density 0.026%