INDEX
Explanations
phrases related to caring about people, issues, or specific groups
phrases that indicate concern or interest in people and their needs
New Auto-Interp
Negative Logits
ross
-0.84
cession
-0.81
BuyableInstoreAndOnline
-0.81
hesis
-0.78
Figure
-0.77
atar
-0.75
igmatic
-0.75
Lay
-0.74
hiba
-0.73
Cola
-0.71
POSITIVE LOGITS
preserving
0.95
respecting
0.84
lessly
0.84
fairness
0.80
protecting
0.78
aesthetics
0.77
integrity
0.77
politics
0.75
preservation
0.75
maximizing
0.75
Activations Density 0.032%