INDEX
Explanations
phrases indicating negation or denial
negations or phrases expressing denial or exclusion
New Auto-Interp
Negative Logits
ushes
-0.71
uan
-0.66
Tens
-0.65
Ö¼
-0.65
Ĥİ
-0.64
ourney
-0.64
riber
-0.63
awks
-0.63
velt
-0.61
anders
-0.60
POSITIVE LOGITS
necessarily
1.09
icably
1.03
hin
1.03
eworthy
1.02
orious
1.00
permitted
0.99
icable
0.98
amused
0.95
yet
0.94
ifying
0.90
Activations Density 0.126%