INDEX
Explanations
negations and phrases expressing limitations or qualifications
New Auto-Interp
Negative Logits
üb
-0.16
inos
-0.15
непÑĢи
-0.14
обÑĢаз
-0.14
842
-0.14
tương
-0.14
ledged
-0.14
nowhere
-0.13
bare
-0.13
esine
-0.13
POSITIVE LOGITS
just
0.30
limited
0.28
merely
0.28
exclusive
0.27
rocket
0.25
JUST
0.25
about
0.24
simply
0.24
only
0.23
rocket
0.23
Activations Density 0.162%