INDEX
Explanations
mentions of current US politics and related events
New Auto-Interp
Negative Logits
ende
-0.74
incorpor
-0.72
ponder
-0.70
imagination
-0.66
mushroom
-0.65
unwanted
-0.64
controvers
-0.64
salv
-0.63
jog
-0.63
pleasures
-0.62
POSITIVE LOGITS
ï¸ı
1.26
¯
0.95
ÃĽ
0.92
STEM
0.91
ï¸
0.91
cue
0.86
°
0.85
âĢł
0.82
said
0.81
âϦ
0.81
Activations Density 1.478%