INDEX
Explanations
questions related to personal wealth and societal roles
New Auto-Interp
Negative Logits
inski
-0.15
gan
-0.15
.generated
-0.14
anford
-0.14
hol
-0.14
ford
-0.14
GAN
-0.14
owe
-0.14
ups
-0.14
GO
-0.14
POSITIVE LOGITS
altogether
0.20
overall
0.19
Overall
0.18
All
0.17
Scale
0.15
Overall
0.15
amespace
0.15
åħ±åIJĮ
0.15
all
0.14
all
0.14
Activations Density 0.074%