INDEX
Explanations
phrases related to a contrast or contradiction
phrases characterized by evaluative or opinionated expressions
New Auto-Interp
Negative Logits
luaj
-0.70
ukong
-0.62
newsp
-0.60
dit
-0.60
Laughs
-0.59
ologically
-0.58
bies
-0.58
welf
-0.56
iyah
-0.55
Introduced
-0.55
POSITIVE LOGITS
nonetheless
1.29
nevertheless
1.22
hardly
1.04
still
1.03
unlikely
1.02
undeniable
1.00
undeniably
0.99
doubtful
0.97
unclear
0.97
certainly
0.94
Activations Density 0.172%