INDEX
Explanations
verbs related to taking action or making changes
references to implementing or enacting changes or policies
New Auto-Interp
Negative Logits
ubi
-0.76
istg
-0.72
INFO
-0.72
Represent
-0.64
enery
-0.63
feld
-0.61
Rus
-0.61
pring
-0.61
ahoo
-0.61
tub
-0.60
POSITIVE LOGITS
policies
1.18
reforms
1.12
stricter
1.11
measures
0.99
safeguards
0.97
strategies
0.96
stringent
0.94
corrective
0.92
recommendations
0.92
regulations
0.89
Activations Density 0.103%