INDEX
Explanations
phrases related to enforcing rules, mandates, or duties
mentions of authority and imposition
New Auto-Interp
Negative Logits
Tribe
-0.62
voyage
-0.58
beck
-0.58
Sioux
-0.57
Clyde
-0.56
room
-0.56
ovies
-0.56
Bethlehem
-0.56
kers
-0.56
adelphia
-0.55
POSITIVE LOGITS
ocide
0.71
mantra
0.71
stricter
0.70
WS
0.70
onz
0.69
compulsion
0.68
rules
0.68
azon
0.67
bug
0.67
atars
0.67
Activations Density 0.261%