INDEX
Explanations
words related to social issues and challenges
phrases indicating negative societal issues or challenges
New Auto-Interp
Negative Logits
olid
-0.72
yna
-0.66
pione
-0.65
rael
-0.65
aphael
-0.65
cffffcc
-0.64
tesy
-0.62
estern
-0.62
future
-0.61
POST
-0.61
POSITIVE LOGITS
THERE
0.81
there
0.75
there
0.69
temperatures
0.65
hairs
0.65
Defendants
0.65
owing
0.64
There
0.64
};
0.62
lacks
0.62
Activations Density 0.485%