INDEX
    Explanations

    questions related to personal wealth and societal roles

    New Auto-Interp
    Negative Logits
    inski
    -0.15
    gan
    -0.15
    .generated
    -0.14
    anford
    -0.14
    hol
    -0.14
    ford
    -0.14
    GAN
    -0.14
    owe
    -0.14
    ups
    -0.14
    GO
    -0.14
    POSITIVE LOGITS
     altogether
    0.20
     overall
    0.19
     Overall
    0.18
     All
    0.17
     Scale
    0.15
    Overall
    0.15
    amespace
    0.15
    åħ±åIJĮ
    0.15
    all
    0.14
     all
    0.14
    Act Density 0.074%

    No Known Activations