INDEX
Explanations
reflexive pronouns and phrases related to self-reference
New Auto-Interp
Negative Logits
dra
-0.17
itself
-0.16
himself
-0.16
nc
-0.14
ncy
-0.14
Wheel
-0.14
apolis
-0.14
ochrome
-0.14
enheim
-0.14
eming
-0.14
POSITIVE LOGITS
zelf
0.26
elf
0.19
/us
0.16
lef
0.16
elves
0.15
'icon
0.15
acey
0.15
zel
0.14
IPP
0.14
erv
0.14
Activations Density 0.067%