INDEX
Explanations
the word "Despite" in texts
phrases that indicate contrast or contradiction
New Auto-Interp
Negative Logits
oided
-0.82
ANS
-0.73
culated
-0.71
ecycle
-0.71
rone
-0.70
ribe
-0.69
iaries
-0.66
area
-0.66
isa
-0.65
irez
-0.65
POSITIVE LOGITS
assurances
1.01
setbacks
1.01
appearances
1.00
acknowledging
0.98
seeming
0.95
being
0.92
having
0.91
boasting
0.91
foregoing
0.89
warnings
0.88
Activations Density 0.058%