INDEX
Explanations
phrases containing the word "demeaning"
words and phrases related to derogatory or belittling commentary
New Auto-Interp
Negative Logits
unchecked
-0.75
$$$$
-0.70
Belt
-0.66
whiff
-0.66
ategory
-0.62
unexplained
-0.62
nose
-0.61
pmwiki
-0.59
heed
-0.58
[+
-0.57
POSITIVE LOGITS
ufact
1.05
brance
0.91
culated
0.91
azon
0.90
oleon
0.88
acements
0.87
ogyn
0.87
rium
0.86
stration
0.82
andise
0.78
Activations Density 0.114%