INDEX
Explanations
phrases related to social and political issues, economic inequality, and community welfare
New Auto-Interp
Negative Logits
auga
-0.62
uador
-0.59
arter
-0.57
ande
-0.55
thood
-0.55
zie
-0.54
nikov
-0.53
andum
-0.53
ritz
-0.53
gat
-0.52
POSITIVE LOGITS
pires
0.77
turns
0.65
translates
0.61
ifies
0.57
happens
0.56
turned
0.54
itself
0.54
ãĤ©
0.53
describes
0.53
¢
0.53
Activations Density 12.495%