INDEX
Explanations
instances where a contrast or comparison is being made
the recurrent use of the word "while."
New Auto-Interp
Negative Logits
bed
-0.84
icted
-0.79
red
-0.77
ahime
-0.74
esi
-0.73
ursed
-0.70
uced
-0.70
sted
-0.69
bard
-0.69
aer
-0.67
POSITIVE LOGITS
acknowledging
1.02
conced
0.88
researching
0.85
browsing
0.82
admitting
0.81
respecting
0.81
touring
0.74
agreeing
0.73
technically
0.72
condemning
0.70
Activations Density 0.044%