INDEX
Explanations
sentences that discuss comparisons and contrasts between entities
New Auto-Interp
Negative Logits
zwar
-0.17
elin
-0.17
ulumi
-0.15
èϽçĦ¶
-0.15
vice
-0.14
ianne
-0.14
볨
-0.14
olley
-0.14
Ïģιν
-0.14
à¸Īะà¹Ħà¸Ķ
-0.14
POSITIVE LOGITS
nonetheless
0.21
also
0.20
nevertheless
0.19
nowhere
0.17
also
0.16
still
0.16
lez
0.16
è¿ĺæĺ¯
0.15
tera
0.15
597
0.15
Activations Density 0.124%