INDEX
Explanations
expressions related to manners, specifically focusing on concepts of rudeness and politeness
terms related to rudeness and politeness
New Auto-Interp
Negative Logits
lisher
-0.86
ernels
-0.83
ishop
-0.75
panel
-0.75
hart
-0.74
razil
-0.73
ARK
-0.71
lished
-0.68
ilation
-0.68
yrinth
-0.68
POSITIVE LOGITS
rude
0.93
rud
0.87
etiquette
0.87
awakening
0.81
polite
0.81
manners
0.79
disrespectful
0.78
disrespect
0.78
greeting
0.76
respectfully
0.73
Activations Density 0.041%