INDEX
Explanations
mentions of social media usernames and handles
mentions of social media accounts and interactions with them
New Auto-Interp
Head Attr Weights
0:0.17
1:0.05
2:0.07
3:0.13
4:0.04
5:0.11
6:0.07
7:0.03
8:0.11
9:0.07
10:0.07
11:0.03
Negative Logits
®
-1.39
virtues
-1.30
therein
-1.29
astronauts
-1.26
treasures
-1.25
arsenic
-1.21
pleasures
-1.17
Sodium
-1.11
fishes
-1.11
etheless
-1.10
POSITIVE LOGITS
yp
1.58
union
1.33
riot
1.29
spr
1.29
record
1.28
oll
1.26
poll
1.25
nee
1.24
hz
1.24
gyn
1.23
Activations Density 0.019%