INDEX
Explanations
references to personal pronouns in the text
New Auto-Interp
Negative Logits
reno
-0.17
cio
-0.16
ieber
-0.16
rane
-0.16
pper
-0.15
Pier
-0.14
Ones
-0.14
dsp
-0.14
NoSuch
-0.14
ÑĤо
-0.13
POSITIVE LOGITS
odd
0.18
ãĤ¯ãĥ©ãĥĸ
0.14
izarre
0.14
pNet
0.14
arp
0.14
cach
0.13
din
0.13
odd
0.13
edic
0.13
Odd
0.13
Activations Density 0.031%