INDEX
Explanations
punctuation marks and sentence structure indicators within the text
New Auto-Interp
Negative Logits
istrovstvÃŃ
-0.17
ces
-0.16
Hats
-0.15
twig
-0.14
orpion
-0.14
κη
-0.14
zew
-0.14
bish
-0.14
hra
-0.14
unts
-0.14
POSITIVE LOGITS
zym
0.17
idor
0.17
uber
0.16
ARS
0.15
iller
0.15
Ñīик
0.15
¬Ĥ
0.14
agini
0.14
474
0.14
ajaran
0.14
Activations Density 0.017%