INDEX
Explanations
issues related to social justice and economic disparity
New Auto-Interp
Negative Logits
702
-0.16
ayah
-0.15
922
-0.14
iets
-0.14
å±ŀ
-0.13
923
-0.13
odus
-0.13
uria
-0.13
oque
-0.13
incon
-0.13
POSITIVE LOGITS
gon
0.15
onders
0.14
ãĥĨãĥ«
0.14
-expanded
0.13
ross
0.12
ÎĪ
0.12
obec
0.12
наÑĩе
0.12
anten
0.12
embod
0.12
Activations Density 0.335%