INDEX
Explanations
phrases related to social issues like poverty, education, opportunities, and access to resources
New Auto-Interp
Negative Logits
puzz
-0.74
hint
-0.68
overest
-0.68
ado
-0.68
curiously
-0.67
sins
-0.66
cursing
-0.66
tabl
-0.65
unlucky
-0.65
mysteriously
-0.65
POSITIVE LOGITS
sustainable
1.15
equitable
1.04
inclusive
1.02
cellence
0.99
responsibly
0.96
ordable
0.95
achievable
0.89
unbiased
0.89
productive
0.88
meaningful
0.88
Activations Density 3.465%