INDEX
Explanations
phrases related to news headlines or current events
mentions of locations and political contexts
New Auto-Interp
Negative Logits
anwhile
-0.67
itored
-0.54
respectively
-0.52
ashington
-0.51
local
-0.50
Local
-0.48
srf
-0.47
igible
-0.47
redited
-0.47
named
-0.46
POSITIVE LOGITS
coin
0.56
guy
0.56
sag
0.53
sucks
0.50
guys
0.50
devs
0.49
Coin
0.47
!)
0.46
gamer
0.45
istor
0.45
Activations Density 2.571%