INDEX
Explanations
content that has gone viral on social media
instances of social media virality and public reactions
New Auto-Interp
Negative Logits
contrace
-0.74
subordinate
-0.72
confinement
-0.70
sacrifice
-0.67
asleep
-0.66
prisoner
-0.65
screwed
-0.64
triangle
-0.64
enary
-0.64
cius
-0.63
POSITIVE LOGITS
0.89
Reviewer
0.87
rovers
0.86
Gawker
0.86
Soon
0.85
Gamergate
0.84
Critics
0.81
Within
0.80
Soon
0.80
shown
0.79
Activations Density 0.812%