INDEX
Explanations
mentions of the word "well" followed by a number rating
the adverb "well."
New Auto-Interp
Negative Logits
ategory
-0.62
TAG
-0.59
generated
-0.58
torn
-0.58
Ts
-0.58
glam
-0.58
finance
-0.57
style
-0.57
pric
-0.57
generation
-0.56
POSITIVE LOGITS
well
4.61
Well
1.48
well
1.32
Well
1.27
hey
1.19
way
1.17
wait
1.11
we
1.06
worth
1.02
stable
1.01
Activations Density 0.010%