INDEX
Explanations
instances of the pronoun "I."
New Auto-Interp
Negative Logits
cca
-0.17
ante
-0.17
enville
-0.16
CCA
-0.15
zbollah
-0.15
anmar
-0.15
gli
-0.14
æĿ
-0.14
nie
-0.14
autor
-0.14
POSITIVE LOGITS
rof
0.17
weg
0.16
527
0.15
eon
0.15
orate
0.15
Duy
0.15
uC
0.15
nder
0.15
iates
0.14
386
0.14
Activations Density 0.095%