INDEX
Explanations
phrases expressing contrasts or contradictions
conjunctions and transitional phrases indicating contrast or exception
New Auto-Interp
Negative Logits
vance
-0.73
nan
-0.72
rontal
-0.69
olid
-0.68
flush
-0.67
rys
-0.67
register
-0.67
lish
-0.67
ribed
-0.66
etts
-0.66
POSITIVE LOGITS
alas
1.00
beware
0.99
downside
0.94
unfortunately
0.91
hindered
0.85
lacks
0.84
drawbacks
0.80
drawback
0.77
hampered
0.77
lacked
0.76
Activations Density 0.428%