INDEX
Explanations
phrases that indicate skepticism or contradiction regarding official statements and policies
New Auto-Interp
Negative Logits
Ab
-0.15
private
-0.15
XC
-0.15
Replies
-0.15
lescope
-0.14
Newman
-0.14
821
-0.14
.tp
-0.13
DebugEnabled
-0.13
aylor
-0.13
POSITIVE LOGITS
ffen
0.17
ometry
0.16
ucer
0.15
gel
0.15
oner
0.15
=key
0.15
ucher
0.14
ÛĮÙĨÙĩ
0.14
aukee
0.14
quila
0.14
Activations Density 0.218%