INDEX
Explanations
names of social media platforms and website features
words and phrases that include personal pronouns or identifiers
New Auto-Interp
Negative Logits
pse
-0.66
symp
-0.64
è£
-0.62
respons
-0.62
xxxx
-0.61
encour
-0.61
disg
-0.60
compe
-0.60
pheus
-0.58
remem
-0.58
POSITIVE LOGITS
zbollah
0.81
Vegan
0.74
Answer
0.74
Wiki
0.70
Expand
0.70
Recipe
0.69
Updated
0.69
Favorite
0.69
Overview
0.68
SHARES
0.67
Activations Density 0.244%