INDEX
Explanations
positive descriptions of people's friendliness and helpfulness
New Auto-Interp
Negative Logits
wang
-0.07
ame
-0.06
zie
-0.06
(&(
-0.06
AME
-0.06
564
-0.06
eggies
-0.06
ponsored
-0.06
phies
-0.05
émon
-0.05
POSITIVE LOGITS
friendly
0.10
friendly
0.09
Friendly
0.09
hospitality
0.08
-friendly
0.08
Friendly
0.08
welcoming
0.08
genuine
0.07
staff
0.07
Helpful
0.07
Activations Density 0.056%