INDEX
Explanations
posts or articles on social media
New Auto-Interp
Negative Logits
tics
-0.83
itudes
-0.73
ses
-0.72
ernels
-0.69
Occupations
-0.63
ospels
-0.62
urches
-0.62
establishments
-0.62
virtues
-0.61
NetMessage
-0.60
POSITIVE LOGITS
Called
0.74
titled
0.72
called
0.70
less
0.70
atical
0.70
resembling
0.69
called
0.69
usually
0.69
reminiscent
0.69
alyst
0.68
Activations Density 0.390%