INDEX
Explanations
phrases related to policies, regulations, and transparency
New Auto-Interp
Negative Logits
infamous
-0.70
strangely
-0.67
ties
-0.66
inexpl
-0.65
oddly
-0.65
wonders
-0.65
curiously
-0.63
speculated
-0.62
unlucky
-0.62
bizarre
-0.62
POSITIVE LOGITS
ASAP
1.21
responsibly
1.03
adequate
0.93
respectful
0.91
unbiased
0.90
cknow
0.90
properly
0.90
truthful
0.89
equitable
0.88
proper
0.88
Activations Density 4.908%