INDEX
Explanations
instances of clichés or stereotypical expressions
New Auto-Interp
Negative Logits
ee
-0.16
rie
-0.16
otec
-0.16
cej
-0.15
combe
-0.15
itive
-0.14
leys
-0.14
BITS
-0.14
sw
-0.14
tee
-0.14
POSITIVE LOGITS
éd
0.19
ypical
0.18
isms
0.18
ipsis
0.16
istic
0.16
.rc
0.16
endencies
0.15
ysical
0.15
acho
0.15
aurus
0.15
Activations Density 0.039%