INDEX
Explanations
negation followed by contrast
New Auto-Interp
Negative Logits
anything
1.38
any
1.30
Anything
1.15
qualquer
1.14
anything
1.11
cualquier
1.08
任何
1.08
qualsiasi
1.07
anyone
1.06
ANYTHING
1.04
POSITIVE LOGITS
Nor
1.54
nor
1.49
Nor
1.36
nor
1.12
ni
0.97
nem
0.92
Neither
0.89
Neither
0.89
也不是
0.87
Nem
0.87
Activations Density 0.130%