INDEX
Explanations
instances of the word "rules" with an activation level of 9 or 10
references to regulations or guidelines
New Auto-Interp
Negative Logits
ience
-0.86
experien
-0.77
itate
-0.72
aged
-0.70
issance
-0.68
ONSORED
-0.68
Bras
-0.68
apest
-0.67
ienced
-0.66
Gothic
-0.63
POSITIVE LOGITS
governing
1.07
lawy
0.97
Enforcement
0.96
breakers
0.92
book
0.90
enforcement
0.90
rules
0.88
violations
0.87
breaker
0.84
books
0.83
Activations Density 0.024%