INDEX
Explanations
terms related to negative social behaviors or medical conditions
terms related to antisocial behavior and its incidence in various contexts
New Auto-Interp
Negative Logits
士
-0.98
loo
-0.85
cia
-0.81
aby
-0.78
aults
-0.76
peat
-0.74
atche
-0.74
ikan
-0.74
ilts
-0.73
lins
-0.73
POSITIVE LOGITS
incidence
0.87
cipline
0.70
INST
0.69
è£ıè¦ļéĨĴ
0.68
sidx
0.65
uated
0.64
oral
0.63
Jude
0.63
uations
0.63
Malfoy
0.63
Activations Density 0.025%