INDEX
Explanations
statements expressing opinions or thoughts
New Auto-Interp
Negative Logits
İ
-0.15
ungalow
-0.14
bay
-0.14
oba
-0.14
deaux
-0.14
iran
-0.14
ukan
-0.14
Gow
-0.14
se
-0.13
471
-0.13
POSITIVE LOGITS
cü
0.15
.fb
0.15
ICAST
0.15
prech
0.15
ÄĻd
0.15
ibo
0.14
íĥĦ
0.14
nicas
0.14
rote
0.14
égor
0.14
Activations Density 0.148%