INDEX
Explanations
words and phrases that express vulgarity or frustration
New Auto-Interp
Negative Logits
elle
-0.20
dl
-0.19
elas
-0.17
elli
-0.17
el
-0.17
ements
-0.16
ellers
-0.16
ess
-0.16
lite
-0.16
eler
-0.16
POSITIVE LOGITS
sterol
0.19
ucid
0.17
prit
0.17
heck
0.16
iferay
0.16
inese
0.16
unteer
0.16
abyrin
0.16
itude
0.15
mazon
0.15
Activations Density 0.060%