INDEX
Explanations
references to entertainment and games
New Auto-Interp
Negative Logits
Fol
-0.18
fol
-0.16
OK
-0.15
:-)
-0.15
:)
-0.15
fwd
-0.15
;-
-0.15
tod
-0.14
OK
-0.14
folks
-0.14
POSITIVE LOGITS
Rarity
0.21
._
0.20
Jaune
0.19
tumblr
0.17
^^
0.16
derp
0.16
~↵
0.15
âĹ
0.15
uw
0.15
gay
0.15
Activations Density 1.985%