INDEX
Explanations
references to social interactions and relationships
New Auto-Interp
Negative Logits
entirety
-0.22
upon
-0.16
tumblr
-0.16
ìĿ´ëĬĶ
-0.16
snapchat
-0.16
rubbish
-0.15
oric
-0.15
typically
-0.15
ãģ¨ãģªãĤĭ
-0.15
curated
-0.14
POSITIVE LOGITS
everybody
0.17
aggrav
0.17
sez
0.16
Everybody
0.16
nik
0.16
real
0.16
REAL
0.16
Everybody
0.15
Naw
0.15
anybody
0.15
Activations Density 2.254%