INDEX
Explanations
phrases related to authority and control
phrases indicating denial or negation
New Auto-Interp
Negative Logits
adolesc
-0.69
CTV
-0.63
compat
-0.61
rossover
-0.61
rencies
-0.60
IFE
-0.58
atoes
-0.58
transformations
-0.58
ricular
-0.58
speculation
-0.58
POSITIVE LOGITS
intervene
1.27
relent
1.01
approve
1.00
sanction
0.99
evict
0.97
incentiv
0.97
swoop
0.95
dictate
0.95
forbid
0.91
veto
0.91
Activations Density 0.668%