INDEX
Explanations
phrases related to ongoing debates or discussions
frequent mentions of the word "the."
New Auto-Interp
Negative Logits
Layer
-0.79
note
-0.77
ngth
-0.75
emonium
-0.75
bery
-0.71
NB
-0.71
ESPN
-0.71
elaide
-0.71
strom
-0.70
âĢł
-0.70
POSITIVE LOGITS
impending
1.21
possibility
1.20
slightest
1.18
legality
1.15
whereabouts
1.08
direction
1.07
latter
1.07
outcome
1.07
upcoming
1.06
adequ
1.04
Activations Density 0.406%