INDEX
Explanations
verbs indicating actions or changes
personal reflections and expressions of feelings
New Auto-Interp
Negative Logits
themselves
-0.66
respectively
-0.60
omin
-0.59
apiece
-0.57
Their
-0.56
anwhile
-0.54
occupants
-0.53
turnover
-0.51
Us
-0.51
incumb
-0.51
POSITIVE LOGITS
myself
1.47
my
0.96
crochet
0.76
blogging
0.67
oan
0.65
poke
0.64
personally
0.62
writing
0.62
aido
0.61
watching
0.60
Activations Density 0.954%