INDEX
Explanations
number ranges
numeric representations related to film or music collections
New Auto-Interp
Negative Logits
slur
-0.63
polit
-0.62
Gawker
-0.62
SERVICE
-0.61
Wikimedia
-0.61
PubMed
-0.60
Null
-0.60
initialized
-0.60
Pastebin
-0.58
aroused
-0.58
POSITIVE LOGITS
episodes
1.19
episode
1.08
installments
1.00
chapters
0.95
minute
0.95
quel
0.94
CD
0.93
minute
0.92
episode
0.90
inch
0.90
Activations Density 0.126%