INDEX
Explanations
specific pronouns and conjunctions indicating personal perspective and inclusivity
New Auto-Interp
Negative Logits
SED
-0.17
_traits
-0.16
utan
-0.15
aid
-0.14
AUTHORS
-0.14
uds
-0.14
Paulo
-0.13
IDGET
-0.13
olley
-0.13
ollen
-0.13
POSITIVE LOGITS
ÑĢÑĥ
0.16
appa
0.15
crawl
0.15
istrovstvÃŃ
0.15
ugg
0.15
HAL
0.14
ugin
0.14
873
0.14
ragments
0.14
icens
0.14
Activations Density 0.001%