INDEX
Explanations
personal pronouns followed by verbs or adjectives describing actions or attributes
pronouns in the text
New Auto-Interp
Negative Logits
dp
-0.67
Ĥ¬
-0.67
dding
-0.64
Reply
-0.63
icial
-0.61
inton
-0.59
GCC
-0.59
essions
-0.58
orses
-0.57
612
-0.57
POSITIVE LOGITS
excel
0.78
thri
0.77
humble
0.72
behaves
0.69
morph
0.68
popularity
0.68
thrive
0.68
creators
0.68
ubiqu
0.66
existed
0.66
Activations Density 0.802%