INDEX
    Explanations

    references to traditional concepts or practices

    New Auto-Interp
    Negative Logits
    ropolis
    -0.17
    lying
    -0.15
    bras
    -0.15
     Tradition
    -0.15
     tradition
    -0.15
     Reputation
    -0.15
    liness
    -0.15
    hoa
    -0.14
    aging
    -0.14
    laus
    -0.14
    POSITIVE LOGITS
    ists
    0.43
    ist
    0.38
    ism
    0.29
    istic
    0.28
    ista
    0.25
    izing
    0.25
    ISTS
    0.25
    ise
    0.25
    ized
    0.24
    isti
    0.24
    Act Density 0.031%

    No Known Activations