INDEX
Explanations
terms related to rejection or disapproval
New Auto-Interp
Negative Logits
uckle
-0.08
-thirds
-0.07
sey
-0.06
ux
-0.06
utton
-0.06
ellen
-0.06
/read
-0.06
UX
-0.06
-0.06
vor
-0.06
POSITIVE LOGITS
ably
0.09
ively
0.09
/ref
0.08
resher
0.08
imated
0.07
hur
0.07
æİī
0.07
genes
0.07
SvÄĽt
0.07
rance
0.07
Activations Density 0.010%