INDEX
Explanations
references to media and criticism of social issues
New Auto-Interp
Negative Logits
amazingly
-0.19
:-)
-0.18
âĢIJ
-0.18
hubby
-0.18
<![
-0.17
<![
-0.16
although
-0.16
electronically
-0.15
Blogs
-0.15
``
-0.15
POSITIVE LOGITS
ãĥ¼
0.25
surve
0.21
âģ
0.20
0.19
incentiv
0.19
âĢĬ
0.18
ðŁij
0.18
0.18
THREAD
0.18
fuck
0.18
Activations Density 0.543%