INDEX
Explanations
descriptions of actions or suggestions related to helping or aiding others
pronouns and actions related to potential or capability
New Auto-Interp
Negative Logits
sqor
-0.78
millenn
-0.65
partName
-0.64
Cosponsors
-0.62
attm
-0.59
Arri
-0.58
fraught
-0.58
treacher
-0.58
lately
-0.57
Nay
-0.57
POSITIVE LOGITS
can
1.13
wouldn
1.03
could
1.02
wont
1.02
'll
0.91
dont
0.91
won
0.85
sylv
0.84
avoids
0.80
maxim
0.79
Activations Density 0.154%