INDEX
Explanations
personal pronouns and verbs related to actions or intentions expressed in the present tense
expressions related to personal experience or perspectives
New Auto-Interp
Negative Logits
similar
-0.71
sequ
-0.66
similarly
-0.64
juries
-0.64
ventions
-0.63
uthor
-0.61
ournal
-0.59
rities
-0.58
harms
-0.58
pox
-0.57
POSITIVE LOGITS
Nare
0.66
behav
0.65
boils
0.64
Alias
0.64
endings
0.64
_-_
0.63
nutshell
0.62
!.
0.62
liest
0.62
AME
0.62
Activations Density 0.227%