INDEX
    Explanations

    concepts related to commonality or shared characteristics

    New Auto-Interp
    Negative Logits
    tring
    -0.15
    anager
    -0.15
    ut
    -0.15
    /ph
    -0.15
    LM
    -0.15
    door
    -0.15
    hta
    -0.15
    uations
    -0.14
    actory
    -0.14
    idel
    -0.14
    POSITIVE LOGITS
    wealth
    0.26
    ities
    0.19
    est
    0.19
    ality
    0.18
    itized
    0.17
    emente
    0.16
    sense
    0.16
     denominator
    0.16
    abb
    0.16
    wy
    0.15
    Act Density 0.031%

    No Known Activations