INDEX
Explanations
contractions of verbs in context
pronouns indicating self-reference and second-person address
New Auto-Interp
Negative Logits
Site
-0.74
Click
-0.69
trace
-0.69
Tweet
-0.68
ð
-0.67
LIN
-0.66
SP
-0.65
IB
-0.65
EP
-0.64
ãĤ¹ãĥĪ
-0.64
POSITIVE LOGITS
gonna
0.77
been
0.77
selves
0.71
entitled
0.70
Been
0.70
been
0.68
gon
0.64
definitely
0.64
gotta
0.62
likely
0.62
Activations Density 0.181%