INDEX
Explanations
personal opinions or thoughts expressed in the first person
first-person pronouns indicating self-reference
New Auto-Interp
Negative Logits
ļéĨĴ
-0.66
affles
-0.61
screen
-0.58
ienne
-0.58
kward
-0.56
Rockefeller
-0.56
Bent
-0.55
Papers
-0.55
Jindal
-0.55
Butter
-0.54
POSITIVE LOGITS
'm
1.48
've
1.35
'll
1.21
guess
1.08
suppose
1.07
'd
1.05
nex
0.97
think
0.94
believe
0.92
MAX
0.90
Activations Density 0.279%