INDEX
Explanations
adjectives related to following rules or regulations
references to law-abiding individuals and their actions
New Auto-Interp
Negative Logits
hy
-0.77
oult
-0.73
Hosp
-0.70
ammy
-0.70
kill
-0.68
ilk
-0.67
cair
-0.66
Nou
-0.65
olog
-0.65
tips
-0.63
POSITIVE LOGITS
abiding
1.41
abiding
1.08
ĸļ
0.86
compr
0.75
obe
0.75
withstanding
0.73
ctuary
0.73
¿½
0.73
rats
0.71
ignt
0.70
Activations Density 0.011%