INDEX
Explanations
Twitter handles and usernames
usernames or handles associated with social media accounts
New Auto-Interp
Negative Logits
Levant
-0.72
anwhile
-0.71
electronically
-0.63
Uriel
-0.60
Bashar
-0.60
Avalon
-0.58
lobbying
-0.56
anonymously
-0.55
PACs
-0.55
Frankenstein
-0.54
POSITIVE LOGITS
dL
1.09
ZI
1.06
dq
1.05
ZX
1.01
9
1.00
0
1.00
zx
0.98
0.97
8
0.97
Q
0.96
Activations Density 0.059%