INDEX
Explanations
words indicating similarities or comparisons
New Auto-Interp
Negative Logits
essa
-0.17
ÑĢом
-0.15
redient
-0.15
urai
-0.14
esktop
-0.14
raki
-0.13
antar
-0.13
ilip
-0.13
uled
-0.13
jinak
-0.13
POSITIVE LOGITS
nhau
0.26
those
0.22
what
0.21
ÑģобоÑİ
0.19
ours
0.19
unto
0.19
Ñģобой
0.19
other
0.17
those
0.17
ones
0.17
Activations Density 0.158%