INDEX
Explanations
Twitter handles
colons and social media references
New Auto-Interp
Negative Logits
cannibal
-0.69
partic
-0.69
abal
-0.69
fres
-0.65
ictional
-0.64
recomp
-0.63
sealed
-0.62
immersion
-0.61
prol
-0.61
transpl
-0.60
POSITIVE LOGITS
edin
0.84
@
0.73
dog
0.73
Redditor
0.72
Whats
0.71
hasht
0.71
MSN
0.71
Twitch
0.69
Tumblr
0.68
Tumblr
0.68
Activations Density 0.118%