INDEX
Explanations
words related to politeness and respect
terms related to courtesy and respectful behavior
New Auto-Interp
Negative Logits
ortium
-0.86
Sett
-0.62
Price
-0.62
Citation
-0.61
Bravo
-0.59
resso
-0.58
Fargo
-0.57
Rx
-0.55
Shutterstock
-0.55
Heard
-0.55
POSITIVE LOGITS
ous
1.23
ously
1.20
OUS
0.98
hing
0.87
anship
0.84
astic
0.84
iously
0.81
aunts
0.80
astically
0.79
ctory
0.79
Activations Density 0.137%