INDEX
Explanations
phrases related to social responsibility and community support efforts
New Auto-Interp
Negative Logits
ough
-0.17
mium
-0.16
exe
-0.15
/welcome
-0.14
etine
-0.14
ibling
-0.14
ABA
-0.14
achu
-0.14
abal
-0.14
ekyll
-0.14
POSITIVE LOGITS
action
0.28
action
0.27
-action
0.25
actions
0.25
/action
0.24
Action
0.24
actions
0.24
Action
0.22
ACTION
0.22
Actions
0.22
Activations Density 0.167%