INDEX
Explanations
instances of the word "no" and its variations to indicate negation
New Auto-Interp
Negative Logits
rary
-0.15
nist
-0.15
panion
-0.15
noon
-0.15
gio
-0.15
ãģĦãģĨ
-0.14
rů
-0.14
419
-0.14
ipsis
-0.14
hua
-0.14
POSITIVE LOGITS
matter
0.44
amount
0.34
matter
0.31
wonder
0.29
Matter
0.28
amount
0.27
sooner
0.25
doubt
0.24
Amount
0.23
one
0.22
Activations Density 0.047%