INDEX
Explanations
common nouns and pronouns indicating possession or relationships
New Auto-Interp
Negative Logits
Demp
-0.16
SCO
-0.15
tuk
-0.15
û
-0.15
Prev
-0.14
ostel
-0.14
_attachments
-0.14
OLA
-0.14
COPE
-0.14
æIJŃ
-0.14
POSITIVE LOGITS
acket
0.16
ber
0.15
ACKET
0.15
chet
0.15
gall
0.14
ervlet
0.14
rawn
0.14
igli
0.14
uso
0.14
iform
0.14
Activations Density 0.001%