INDEX
Explanations
social media related posts or activities
references to various types of posts and media in a social context
New Auto-Interp
Negative Logits
assies
-0.78
lees
-0.73
corridors
-0.70
atives
-0.69
fame
-0.69
verty
-0.68
rooms
-0.68
pees
-0.68
inations
-0.67
riages
-0.66
POSITIVE LOGITS
spokesperson
0.86
ccording
0.82
researcher
0.77
example
0.75
spokeswoman
0.74
spokesman
0.73
stub
0.73
refres
0.73
biologist
0.72
therapist
0.69
Activations Density 0.318%