INDEX
Explanations
news headlines including phrases instructing to 'Read more'
phrases and references that indicate significant events or actions in a narrative context
New Auto-Interp
Negative Logits
hooked
-0.73
answ
-0.69
mosqu
-0.64
Cyp
-0.61
aggreg
-0.61
Balanced
-0.58
ingested
-0.57
Compact
-0.57
hosted
-0.56
converted
-0.56
POSITIVE LOGITS
than
0.90
prev
0.78
inctions
0.73
pel
0.72
ukong
0.71
rug
0.71
Fra
0.70
haw
0.70
Premium
0.68
advant
0.68
Activations Density 0.054%