INDEX
Explanations
questions being posed in the text
New Auto-Interp
Negative Logits
å®ĥ
-0.21
resident
-0.17
(it
-0.16
ï¼Įå®ĥ
-0.15
nelle
-0.15
It
-0.15
,it
-0.15
aries
-0.15
nó
-0.15
Erot
-0.15
POSITIVE LOGITS
/w
0.27
they
0.26
nt
0.24
we
0.24
ady
0.23
tha
0.21
these
0.21
/is
0.21
они
0.20
you
0.20
Activations Density 0.063%