INDEX
Explanations
terms related to offensive or aggression in sports contexts
New Auto-Interp
Negative Logits
ofilm
-0.16
oling
-0.16
ellen
-0.15
ait
-0.15
.unpack
-0.15
ÐĶÐļ
-0.15
shot
-0.14
Rab
-0.14
oud
-0.14
etro
-0.14
POSITIVE LOGITS
263
0.17
iciel
0.16
ulty
0.15
Spoon
0.14
masturbating
0.14
NamedQuery
0.14
chan
0.14
intim
0.14
lett
0.13
icions
0.13
Activations Density 0.008%