INDEX
Explanations
comparative phrases indicating preference or contrast
New Auto-Interp
Negative Logits
nonetheless
-0.16
arken
-0.15
nicht
-0.15
niet
-0.15
ERSHEY
-0.15
esco
-0.14
Ľi
-0.14
nejen
-0.14
både
-0.14
uly
-0.14
POSITIVE LOGITS
necessarily
0.34
merely
0.27
vice
0.26
relying
0.25
being
0.24
mere
0.24
just
0.24
simply
0.23
having
0.23
being
0.22
Activations Density 0.090%