INDEX
Explanations
phrases indicating a recommendation to view content
news headlines that are labeled as "MUST WATCH."
New Auto-Interp
Negative Logits
acia
-0.71
ised
-0.64
bleed
-0.63
bun
-0.62
nib
-0.61
clay
-0.60
halluc
-0.60
envy
-0.58
recl
-0.58
amalg
-0.58
POSITIVE LOGITS
WATCH
0.75
VIDEOS
0.73
Watching
0.69
esome
0.68
IMAGES
0.68
------------------------------------------------
0.68
dog
0.67
...]
0.63
ARDS
0.62
degree
0.62
Activations Density 0.010%