INDEX
Explanations
phrases related to policies and government decisions
references to "policy" and its related contexts
New Auto-Interp
Negative Logits
Stain
-0.78
DAY
-0.74
Sparkle
-0.71
Norn
-0.69
Templ
-0.69
igans
-0.68
Ñĭ
-0.66
lighting
-0.65
Takeru
-0.64
Warranty
-0.63
POSITIVE LOGITS
making
1.08
makers
0.98
prescriptions
0.98
makers
0.92
agenda
0.89
objectives
0.88
stances
0.88
correctness
0.88
advisers
0.86
advisors
0.86
Activations Density 0.047%