INDEX
Explanations
phrases related to social issues and public policy
New Auto-Interp
Negative Logits
aths
-0.75
tones
-0.73
aws
-0.73
amia
-0.72
affles
-0.72
ickets
-0.72
ARS
-0.71
adle
-0.70
oslav
-0.70
aughtered
-0.70
POSITIVE LOGITS
phenomenon
1.03
discrepancy
1.02
culminated
0.98
article
0.98
latter
0.95
contrasts
0.94
means
0.90
includes
0.90
arrangement
0.90
translates
0.89
Activations Density 0.151%