INDEX
Explanations
phrases related to social and economic disparities
New Auto-Interp
Negative Logits
imum
-0.87
ACTIONS
-0.78
ixel
-0.78
ITS
-0.76
acters
-0.74
cade
-0.73
issors
-0.72
aeda
-0.72
operation
-0.71
VIDEOS
-0.71
POSITIVE LOGITS
hungry
1.31
vulnerable
1.27
humiliated
1.27
frail
1.27
dying
1.25
incompetent
1.25
helpless
1.25
powerless
1.24
impoverished
1.23
lonely
1.23
Activations Density 0.254%