INDEX
Explanations
references to social inequalities and calls for reform
New Auto-Interp
Negative Logits
whom
-0.16
å¾
-0.15
IDX
-0.15
wins
-0.14
zion
-0.14
θÏħ
-0.14
cee
-0.14
ataire
-0.14
youre
-0.14
suche
-0.13
POSITIVE LOGITS
that
0.32
ìĿ´ê°Ģ
0.24
ãģĮ
0.23
ÑĩÑĤо
0.23
ê°Ģ
0.23
That
0.22
yang
0.22
that
0.21
ÏĢοÏħ
0.21
which
0.21
Activations Density 0.297%