INDEX
Explanations
titles or phrases containing actions or commands
references to songs and musical works
New Auto-Interp
Negative Logits
artifacts
-0.76
oidal
-0.74
quo
-0.72
sidew
-0.71
imposed
-0.69
warts
-0.69
intent
-0.68
chem
-0.68
strengths
-0.66
ickr
-0.66
POSITIVE LOGITS
Us
1.12
Them
1.05
Hate
1.02
Guys
1.00
Wrong
0.97
Love
0.97
Own
0.95
Believe
0.94
Dating
0.94
Happ
0.93
Activations Density 0.205%