INDEX
Explanations
references to social media platforms and their usage
New Auto-Interp
Negative Logits
Websites
-0.16
Web
-0.16
rám
-0.15
aska
-0.15
ternet
-0.15
Tweet
-0.14
Web
-0.14
WEB
-0.14
bucks
-0.14
Guardian
-0.14
POSITIVE LOGITS
account
0.49
accounts
0.42
page
0.37
account
0.35
Account
0.31
Accounts
0.31
handle
0.31
_account
0.30
channel
0.30
feed
0.30
Activations Density 0.050%