INDEX
Explanations
words related to social class distinctions and the concept of gentility
New Auto-Interp
Negative Logits
ngr
-0.17
eria
-0.15
ávánÃŃ
-0.14
RelativeTo
-0.14
#af
-0.14
ution
-0.14
itz
-0.14
@nate
-0.14
@js
-0.14
ndl
-0.14
POSITIVE LOGITS
191
0.15
inks
0.14
hya
0.14
ons
0.14
ix
0.14
pector
0.14
warm
0.14
ile
0.14
iped
0.14
ize
0.13
Activations Density 0.015%