INDEX
Explanations
various forms of the word "this" and related demonstrative terms in different contexts
New Auto-Interp
Negative Logits
scor
-0.16
bah
-0.16
angelo
-0.15
agraph
-0.15
angered
-0.14
inge
-0.14
身
-0.14
vas
-0.14
icone
-0.14
igar
-0.14
POSITIVE LOGITS
ifar
0.17
ekil
0.16
UPLE
0.16
endas
0.15
auer
0.15
žÃŃ
0.15
GLISH
0.15
deflate
0.14
kun
0.14
olem
0.14
Activations Density 0.001%