INDEX
Explanations
instances where the text contrasts a situation with potential negative implications against something else or raises concerns
phrases indicating a contrast or comparison
New Auto-Interp
Negative Logits
si
-0.82
RIP
-0.79
atos
-0.76
bowl
-0.76
itect
-0.74
pecially
-0.72
PI
-0.70
utm
-0.68
().
-0.68
Its
-0.68
POSITIVE LOGITS
nonetheless
1.04
etheless
0.98
undeniable
0.80
nevertheless
0.76
deeper
0.75
chers
0.74
curiously
0.74
undeniably
0.72
challeng
0.72
broader
0.71
Activations Density 0.592%