INDEX
Explanations
the word "Are" at the beginning of questions
questions starting with "Are" that probe for confirmation or clarification
New Auto-Interp
Negative Logits
hand
-0.57
drafts
-0.57
given
-0.55
fixture
-0.55
straight
-0.54
dash
-0.54
solution
-0.53
rious
-0.53
åĤ
-0.53
priv
-0.53
POSITIVE LOGITS
Are
3.21
Are
2.12
Were
2.02
Aren
1.95
Is
1.84
ARE
1.68
Did
1.59
Have
1.49
Were
1.48
Can
1.46
Activations Density 0.011%