INDEX
Explanations
personal pronouns and verbs indicating intentions or future actions
references to collective experiences or actions involving groups of people
New Auto-Interp
Negative Logits
Reviewer
-0.80
REDACTED
-0.78
Citation
-0.76
Sheen
-0.67
ests
-0.66
esting
-0.65
ipedia
-0.63
¿½
-0.63
Examples
-0.61
Lyon
-0.60
POSITIVE LOGITS
need
1.48
're
1.41
NEED
1.31
've
1.30
haven
1.30
couldn
1.30
owe
1.26
'll
1.26
want
1.24
intend
1.23
Activations Density 0.332%