INDEX
Explanations
elements related to social interactions and contexts
New Auto-Interp
Negative Logits
even
-0.25
even
-0.20
sogar
-0.19
especially
-0.19
also
-0.19
too
-0.18
竣
-0.18
despite
-0.18
both
-0.17
-even
-0.17
POSITIVE LOGITS
nothing
0.46
NOTHING
0.41
nothing
0.40
Nothing
0.36
thôi
0.36
Nothing
0.34
nada
0.28
nowhere
0.28
ниÑĩего
0.27
saja
0.26
Activations Density 0.060%