INDEX
Explanations
instances of personal pronouns and relative clauses
New Auto-Interp
Negative Logits
ational
-0.16
bla
-0.15
erton
-0.14
rof
-0.14
Kis
-0.14
use
-0.14
amble
-0.14
Moon
-0.14
sponge
-0.14
ba
-0.14
POSITIVE LOGITS
Verde
0.15
egin
0.15
isms
0.14
离
0.14
arez
0.14
ì¤Ģ
0.14
atik
0.14
779
0.14
__;
0.14
mans
0.14
Activations Density 0.062%