INDEX
Explanations
negative sentiments and criticisms related to injustice or inequality
New Auto-Interp
Negative Logits
italize
-0.15
locate
-0.15
ména
-0.15
racat
-0.15
exampleInput
-0.15
ulates
-0.15
ileen
-0.15
ieves
-0.14
ysters
-0.14
cedes
-0.14
POSITIVE LOGITS
ulous
0.19
orous
0.18
arious
0.16
emonic
0.16
emic
0.16
ful
0.16
-than
0.15
icrous
0.15
eful
0.15
ellation
0.15
Activations Density 0.516%