INDEX
Explanations
does not necessarily affect
New Auto-Interp
Negative Logits
not
0.78
不仅仅
0.73
όχι
0.69
nejen
0.66
lidt
0.66
trochu
0.64
nicht
0.64
too
0.62
忍不住
0.62
somewhat
0.62
POSITIVE LOGITS
quelconque
0.85
いずれ
0.84
herhangi
0.80
qualquer
0.67
quaisquer
0.67
whatsoever
0.66
fundamento
0.66
любой
0.65
任何
0.64
quelcon
0.64
Activations Density 0.244%