INDEX
Explanations
phrases indicating a lack of deviation or exceptionalism
phrases that indicate a comparison or highlight differences
New Auto-Interp
Negative Logits
Milky
-0.69
Pax
-0.66
Advisory
-0.61
DA
-0.60
periodic
-0.60
Bots
-0.60
Hitch
-0.59
Supplemental
-0.59
Moder
-0.59
OK
-0.57
POSITIVE LOGITS
whatsoever
1.00
nor
0.77
Transform
0.72
interstitial
0.71
UFC
0.71
etheless
0.70
bleacher
0.70
aunt
0.70
nor
0.69
tes
0.68
Activations Density 0.048%