INDEX
Explanations
phrases related to legal or authoritative actions
phrases related to enforcement actions or efforts to reduce certain behaviors
New Auto-Interp
Negative Logits
sake
-0.71
ensical
-0.69
æµ
-0.68
join
-0.64
Upton
-0.63
hof
-0.61
Norn
-0.61
anecdote
-0.61
Nobel
-0.60
Tribe
-0.60
POSITIVE LOGITS
orously
0.81
enforcement
0.76
territ
0.76
senal
0.71
enforced
0.71
agra
0.70
ERA
0.69
administr
0.68
against
0.67
enforcing
0.67
Activations Density 0.050%