INDEX
Explanations
the word "normal" or variations of it in the text
terms related to normalization and societal standards
New Auto-Interp
Negative Logits
better
-0.76
artisan
-0.75
hani
-0.69
leted
-0.67
iosyncr
-0.67
Winged
-0.63
resent
-0.62
Theft
-0.62
REL
-0.62
çīĪ
-0.61
POSITIVE LOGITS
cy
1.58
ization
1.45
izing
1.43
ised
1.40
isation
1.39
izes
1.32
ising
1.24
ized
1.24
ize
1.20
izations
1.17
Activations Density 0.042%