INDEX
Explanations
email addresses
mentions of social media handles and email addresses
New Auto-Interp
Negative Logits
Elephant
-0.76
Sod
-0.66
embold
-0.66
retali
-0.66
cru
-0.65
Pumpkin
-0.65
catapult
-0.65
subjug
-0.65
coerc
-0.64
rehabilit
-0.63
POSITIVE LOGITS
gmail
1.32
#$
1.18
yahoo
1.15
gs
0.97
mic
0.94
lists
0.90
tm
0.89
gt
0.89
debian
0.89
home
0.87
Activations Density 0.015%