INDEX
Explanations
references to subreddit names
references to various online communities and subreddits
New Auto-Interp
Negative Logits
utherland
-0.72
Isa
-0.67
bills
-0.65
imilar
-0.65
Samar
-0.63
ierrez
-0.62
ogly
-0.62
Tray
-0.62
Booth
-0.61
NRS
-0.61
POSITIVE LOGITS
occup
0.86
DATA
0.86
bt
0.82
soc
0.82
umblr
0.81
social
0.81
Own
0.81
politics
0.81
Tumblr
0.80
roller
0.80
Activations Density 0.027%