INDEX
Explanations
punctuation marks indicating questions or dramatic pauses
New Auto-Interp
Negative Logits
Locator
-0.15
hti
-0.14
εια
-0.14
erence
-0.14
구
-0.14
herself
-0.14
vek
-0.13
åīĽ
-0.13
fern
-0.13
anca
-0.13
POSITIVE LOGITS
I
0.30
we
0.28
they
0.20
We
0.19
none
0.18
I
0.18
there
0.18
inval
0.17
æĪij
0.16
our
0.16
Activations Density 0.001%