INDEX
Explanations
references to comparisons or similarities between concepts or entities
New Auto-Interp
Negative Logits
anik
-0.17
stan
-0.17
.Localization
-0.16
lsruhe
-0.14
unny
-0.14
ç«ĭãģ¦
-0.14
cts
-0.14
oras
-0.14
ãĥªãĥ¼ãĤº
-0.14
abi
-0.13
POSITIVE LOGITS
напÑĢимеÑĢ
0.20
napÅĻÃŃklad
0.19
ä¾ĭå¦Ĥ
0.18
ä¾ĭ
0.17
ÙħØ«ÙĦا
0.17
напÑĢиклад
0.15
napÅĻ
0.15
exemp
0.15
ÛĮرÙĩ
0.15
quelle
0.14
Activations Density 0.101%