INDEX
Explanations
subreddits
the presence of Reddit-style formatting or subreddit references
New Auto-Interp
Negative Logits
SOS
-0.67
Nieto
-0.67
creen
-0.66
Cotton
-0.66
Reloaded
-0.66
Mile
-0.65
CPR
-0.64
Corpus
-0.63
GPS
-0.63
Pom
-0.63
POSITIVE LOGITS
outine
1.39
ussia
1.34
iding
1.25
outing
1.22
uling
1.21
ansom
1.21
abbit
1.18
ussian
1.18
ifles
1.17
aspberry
1.17
Activations Density 0.044%