INDEX
Explanations
Twitter usernames
references to social media accounts, particularly Twitter handles
New Auto-Interp
Negative Logits
ACTIONS
-0.97
Scheme
-0.79
Ninth
-0.76
CSI
-0.74
Direction
-0.74
XVI
-0.74
FAC
-0.74
Hearts
-0.74
Index
-0.71
Edge
-0.70
POSITIVE LOGITS
podcast
1.04
obal
1.01
_
0.97
yp
0.96
photos
0.95
kr
0.94
sth
0.93
anmar
0.92
aily
0.92
dn
0.92
Activations Density 0.137%