INDEX
Explanations
the word "you" in various forms and contexts
New Auto-Interp
Negative Logits
HING
-0.15
ington
-0.14
aukee
-0.14
YLES
-0.14
stoff
-0.14
íĥĦ
-0.14
Rein
-0.14
Pf
-0.13
íĮ
-0.13
amax
-0.13
POSITIVE LOGITS
-même
0.23
zelf
0.20
/us
0.19
adic
0.16
itable
0.15
Spicer
0.15
uld
0.15
Norm
0.15
guys
0.14
ixe
0.14
Activations Density 0.093%