INDEX
Explanations
negative phrases or words expressing denial and absence
New Auto-Interp
Negative Logits
rary
-0.16
ipsis
-0.16
noon
-0.15
ième
-0.15
loo
-0.15
gio
-0.14
nist
-0.14
iguous
-0.14
encers
-0.14
stants
-0.14
POSITIVE LOGITS
matter
0.44
amount
0.36
matter
0.30
Matter
0.27
amount
0.26
doubt
0.25
wonder
0.23
-body
0.23
-one
0.23
Amount
0.23
Activations Density 0.044%