INDEX
Explanations
negations or phrases that indicate the absence of something
New Auto-Interp
Negative Logits
anych
-0.17
Äħd
-0.15
uso
-0.15
оÑĥ
-0.15
uj
-0.15
ga
-0.14
anni
-0.14
ntp
-0.14
Kou
-0.14
roid
-0.14
POSITIVE LOGITS
matter
0.34
wonder
0.27
matter
0.26
Matter
0.23
doubt
0.21
one
0.21
strand
0.20
amount
0.19
sooner
0.18
surprise
0.18
Activations Density 0.045%