INDEX
Explanations
phrases related to negative or harmful actions or behaviors, especially involving violence or illegal activities
phrases related to abusive or predatory behavior
New Auto-Interp
Negative Logits
univers
-0.76
uitive
-0.72
ahime
-0.71
Revision
-0.68
usterity
-0.68
icult
-0.67
rastructure
-0.67
Effective
-0.65
ircraft
-0.65
ãĤ¤
-0.64
POSITIVE LOGITS
stole
1.09
proceeded
1.07
drank
1.05
subsequently
1.05
ate
1.03
secondly
1.03
assaulted
1.03
raped
1.02
overheard
1.02
interfered
1.02
Activations Density 0.345%