INDEX
Explanations
URLs related to specific posts or tweets, likely from social media platforms
occurrences of URLs or web links
New Auto-Interp
Negative Logits
compet
-0.65
shame
-0.64
exit
-0.63
advert
-0.61
arsen
-0.61
armoured
-0.60
duck
-0.60
emouth
-0.58
imperson
-0.57
pronoun
-0.57
POSITIVE LOGITS
status
1.23
posts
1.07
videos
1.03
comments
1.02
issues
1.01
sets
0.91
photos
0.89
articles
0.87
wiki
0.86
tumblr
0.84
Activations Density 0.026%