INDEX
Explanations
comparisons involving "like."
New Auto-Interp
Negative Logits
اÙĤÙĦ
-0.16
chap
-0.14
atsby
-0.14
atorium
-0.14
ÎŃ
-0.14
ulet
-0.13
idding
-0.13
AFP
-0.13
Generation
-0.13
票
-0.13
POSITIVE LOGITS
unto
0.21
antan
0.17
iliki
0.16
-minded
0.16
Beste
0.14
mini
0.14
endar
0.14
ervas
0.14
zes
0.14
minded
0.14
Activations Density 0.054%