INDEX
Explanations
phrases related to negative sentiments or criticism
words or phrases related to "demeaning" or derogatory language
New Auto-Interp
Negative Logits
Elves
-0.67
tis
-0.66
heed
-0.65
ORY
-0.64
nce
-0.63
supper
-0.62
loo
-0.61
FORE
-0.60
Dub
-0.60
Hastings
-0.60
POSITIVE LOGITS
agogue
1.14
igration
1.04
agog
1.04
aterial
1.04
ilit
1.02
ploy
0.98
ixed
0.96
otions
0.96
otion
0.93
ittance
0.91
Activations Density 0.018%