INDEX
Explanations
terms related to social media algorithms and user interaction features
New Auto-Interp
Negative Logits
webs
-0.15
stereotypes
-0.14
ulary
-0.14
obao
-0.14
Moff
-0.14
Websites
-0.14
specular
-0.14
mpar
-0.14
905
-0.13
TK
-0.13
POSITIVE LOGITS
feed
0.25
wall
0.23
timeline
0.22
Feed
0.21
feed
0.21
timeline
0.21
Wall
0.21
-feed
0.20
wall
0.19
Timeline
0.19
Activations Density 0.114%