INDEX
Explanations
words related to rules, regulations, or expected behavior
references to standards of behavior and conduct
New Auto-Interp
Negative Logits
inka
-0.68
iewicz
-0.64
MEN
-0.63
ixed
-0.62
Nightmares
-0.62
cial
-0.61
Sanchez
-0.61
Fried
-0.59
berth
-0.59
Cec
-0.59
POSITIVE LOGITS
ors
1.03
ivity
1.00
uations
0.94
ities
0.90
onduct
0.89
ional
0.89
uated
0.82
ions
0.80
uation
0.79
ivism
0.77
Activations Density 0.023%