INDEX
Explanations
terms related to politics and current events, especially concerning media coverage and public opinion
New Auto-Interp
Negative Logits
rar
-0.70
NESS
-0.69
soDeliveryDate
-0.68
ONSORED
-0.68
anan
-0.65
wine
-0.65
Cola
-0.64
mask
-0.63
729
-0.62
far
-0.62
POSITIVE LOGITS
behalf
1.00
etime
0.94
erous
0.91
eday
0.90
autop
0.85
arrival
0.82
purpose
0.82
cue
0.82
occasion
0.81
fumes
0.81
Activations Density 0.170%