INDEX
Explanations
words related to greetings and introductions
occurrences of introductory phrases or greetings
New Auto-Interp
Negative Logits
exerted
-0.76
implanted
-0.71
aults
-0.70
ebted
-0.70
requisite
-0.69
emitted
-0.67
okia
-0.67
relied
-0.67
effected
-0.66
griev
-0.65
POSITIVE LOGITS
asty
0.78
ffee
0.77
Subtle
0.76
Welcome
0.75
elcome
0.74
Welcome
0.70
Begin
0.69
Paradise
0.68
yles
0.67
affle
0.67
Activations Density 0.054%