INDEX
Explanations
phrases indicating requirements or instructions
phrases expressing necessity or requirements
New Auto-Interp
Negative Logits
Ń·
-0.64
Democr
-0.63
Dism
-0.62
delinqu
-0.62
crime
-0.60
terness
-0.60
Sounds
-0.59
berra
-0.58
Pill
-0.58
emies
-0.57
POSITIVE LOGITS
lessly
1.17
nces
0.77
convincing
0.72
xus
0.71
OTH
0.68
patience
0.67
ache
0.67
esome
0.67
assistance
0.67
accommodations
0.66
Activations Density 0.060%