INDEX
Explanations
phrases that indicate alternative perspectives or rephrasings
New Auto-Interp
Negative Logits
inka
-0.16
allon
-0.15
_bw
-0.14
vak
-0.14
ses
-0.14
ãĥ¼ãĤ
-0.14
заÑģÑĤав
-0.14
iko
-0.14
ntag
-0.13
rtl
-0.13
POSITIVE LOGITS
words
0.46
words
0.37
Words
0.31
.words
0.29
_words
0.29
Words
0.28
(words
0.23
palabras
0.21
wards
0.20
Ñģлова
0.19
Activations Density 0.013%