INDEX
Explanations
words related to inappropriate or offensive content
terms related to obscenity and vulgarity
New Auto-Interp
Negative Logits
boarding
-0.87
zig
-0.87
ctl
-0.82
starter
-0.81
pei
-0.80
iard
-0.79
TT
-0.79
backs
-0.77
onen
-0.74
woods
-0.74
POSITIVE LOGITS
blasp
0.97
obsc
0.96
uously
0.82
lihood
0.77
Gutenberg
0.77
tracts
0.75
exhib
0.74
urized
0.73
writ
0.72
obscene
0.71
Activations Density 0.034%