INDEX
Explanations
negations or denials
phrases that convey negation or denial
New Auto-Interp
Negative Logits
éĹĺ
-0.89
è¿
-0.73
oided
-0.70
çļ
-0.69
WAY
-0.68
éĥ
-0.68
åº
-0.67
çĶŁ
-0.67
æĥ
-0.66
itcher
-0.65
POSITIVE LOGITS
necessarily
1.07
icable
0.99
orious
0.96
hin
0.96
uncommon
0.95
exactly
0.92
advisable
0.87
quite
0.87
easy
0.87
epad
0.86
Activations Density 0.086%