INDEX
Explanations
phrases indicating significant negative impact or consequence
negative impacts and their effects on various groups or entities
New Auto-Interp
Negative Logits
NESS
-0.69
wine
-0.67
mutated
-0.66
whore
-0.63
fw
-0.62
atro
-0.60
supplied
-0.60
cles
-0.59
rods
-0.58
qi
-0.58
POSITIVE LOGITS
morale
0.77
welf
0.69
ãĤ®
0.66
campaigners
0.64
whistle
0.62
whistlebl
0.62
Palestin
0.62
ibaba
0.62
vae
0.62
warts
0.62
Activations Density 0.182%