INDEX
Explanations
webpage welcome messages
instances of the word "Welcome."
New Auto-Interp
Negative Logits
negie
-0.80
orius
-0.68
appropri
-0.68
river
-0.67
onel
-0.65
ieth
-0.65
ole
-0.65
forgiven
-0.64
riter
-0.62
rained
-0.62
POSITIVE LOGITS
elcome
0.93
prise
0.89
Welcome
0.89
Reviewer
0.86
Welcome
0.82
Surprise
0.81
giving
0.80
Guest
0.79
ISSION
0.79
ISTER
0.78
Activations Density 0.021%