INDEX
Explanations
specific rules or guidelines mentioned in a text
mentions of rules and regulations
New Auto-Interp
Negative Logits
ience
-0.82
ONSORED
-0.76
Bras
-0.72
ienced
-0.65
Astro
-0.63
Dak
-0.62
igated
-0.62
issance
-0.61
aged
-0.61
experien
-0.61
POSITIVE LOGITS
rules
0.99
governing
0.96
Rule
0.93
violations
0.90
Rules
0.88
book
0.87
breakers
0.82
DragonMagazine
0.82
lawy
0.82
books
0.81
Activations Density 0.020%