INDEX
Explanations
personal pronouns, particularly "I"
New Auto-Interp
Negative Logits
oca
-0.16
apol
-0.15
volt
-0.15
vik
-0.15
lili
-0.14
ysi
-0.14
让æĪij
-0.14
AllowAnonymous
-0.14
ën
-0.13
ezier
-0.13
POSITIVE LOGITS
SED
0.17
orch
0.15
think
0.15
eyen
0.15
personally
0.15
rium
0.14
Lazar
0.14
Johannes
0.14
IMP
0.14
quin
0.14
Activations Density 0.105%