INDEX
Explanations
words and phrases that imply inclusivity or the presence of multiple elements
New Auto-Interp
Negative Logits
ãģ¾ãģŁ
-0.17
ä¹Ī
-0.16
igo
-0.14
доÑģÑĤ
-0.14
istr
-0.14
обо
-0.14
tır
-0.13
neither
-0.13
oster
-0.13
osit
-0.13
POSITIVE LOGITS
ones
0.48
those
0.45
those
0.34
:
0.32
ones
0.32
Those
0.29
Those
0.28
Ones
0.27
:↵
0.27
تÙĦÙĥ
0.24
Activations Density 0.205%