INDEX
Explanations
phrases related to social and economic disparities
New Auto-Interp
Negative Logits
stal
-0.83
aroo
-0.81
within
-0.79
terms
-0.74
hov
-0.73
Ń·
-0.72
gram
-0.72
çīĪ
-0.71
ARY
-0.71
roo
-0.71
POSITIVE LOGITS
enthusiasm
1.05
luster
0.95
specificity
0.95
sympathy
0.94
manpower
0.93
clarity
0.93
resources
0.93
urgency
0.93
opportunities
0.92
sophistication
0.92
Activations Density 0.057%