INDEX
Explanations
words related to news articles, updates, and information
engagement prompts and advertising content
New Auto-Interp
Negative Logits
respectively
-0.53
mechanically
-0.51
fundamentals
-0.48
alone
-0.48
fame
-0.47
Valve
-0.47
partying
-0.46
aesthetic
-0.46
indifferent
-0.45
deciding
-0.45
POSITIVE LOGITS
UNCLASSIFIED
0.83
WATCHED
0.75
CNN
0.74
POLIT
0.73
News
0.72
Politics
0.71
POLITICO
0.70
embed
0.68
Transcript
0.67
politics
0.65
Activations Density 0.874%