INDEX
Explanations
references to personal or possessive pronouns
New Auto-Interp
Negative Logits
pline
-0.15
itsu
-0.15
kla
-0.15
throp
-0.14
your
-0.14
erc
-0.13
onet
-0.13
fty
-0.13
701
-0.13
yours
-0.13
POSITIVE LOGITS
CLS
0.15
icus
0.14
ensi
0.13
าà¸
0.13
Perr
0.13
Semi
0.13
pockets
0.13
_paper
0.13
gii
0.13
FD
0.12
Activations Density 0.110%