INDEX
Explanations
phrases related to policies or actions being put in place to address various issues or challenges
phrases related to establishing systems or frameworks
New Auto-Interp
Negative Logits
mania
-0.83
ogi
-0.80
ishi
-0.79
lez
-0.76
STER
-0.75
iddle
-0.73
Enjoy
-0.73
anders
-0.71
ubi
-0.70
fest
-0.69
POSITIVE LOGITS
safeguards
1.48
guidelines
1.32
incentives
1.22
adequate
1.19
mechanisms
1.18
guidance
1.17
corrective
1.14
appropriate
1.13
penalties
1.12
protections
1.10
Activations Density 0.526%