INDEX
Explanations
Twitter handles for different individuals
references to social media engagement, particularly on Twitter
New Auto-Interp
Negative Logits
©
-0.94
´
-0.76
raped
-0.72
»
-0.71
IOR
-0.66
itual
-0.64
ocene
-0.64
°
-0.64
TextColor
-0.64
¶æ
-0.63
POSITIVE LOGITS
1.19
1.14
1.04
0.99
0.88
0.86
0.84
Tweet
0.83
Instr
0.81
0.81
Activations Density 0.041%