INDEX
Explanations
words related to rule violations, particularly in sports or other competitive contexts
references to fouls or rule violations
New Auto-Interp
Negative Logits
edia
-0.84
ocobo
-0.76
udeau
-0.76
itan
-0.74
_>
-0.73
Roosevelt
-0.67
igslist
-0.67
etsk
-0.67
aeda
-0.66
iro
-0.66
POSITIVE LOGITS
cery
1.01
smelling
0.88
terness
0.87
sie
0.82
foul
0.78
mouth
0.76
s
0.75
rance
0.68
substances
0.68
misc
0.67
Activations Density 0.043%