INDEX
Explanations
sentences where a statement is made followed by a subsequent clarification or additional information
phrases that indicate contradictory or nuanced statements
New Auto-Interp
Negative Logits
hiba
-0.77
sidx
-0.73
sqor
-0.69
everal
-0.66
Cosponsors
-0.65
aminer
-0.64
roxy
-0.64
Uriel
-0.62
perty
-0.61
Quote
-0.60
POSITIVE LOGITS
necessarily
1.22
anymore
1.04
anything
1.00
nor
0.99
nor
0.94
magically
0.90
any
0.88
ANY
0.80
infall
0.79
anything
0.79
Activations Density 0.299%