INDEX
    Explanations

    phrases expressing pride and support towards different aspects or groups

    New Auto-Interp
    Negative Logits
    block
    -0.69
    ences
    -0.67
    enz
    -0.65
     LW
    -0.64
    Ess
    -0.62
     alternatives
    -0.62
    Option
    -0.61
    erg
    -0.60
    specified
    -0.60
     situations
    -0.60
    POSITIVE LOGITS
     proud
    3.76
     ashamed
    1.89
     Proud
    1.87
     proudly
    1.86
     pride
    1.69
     pleased
    1.62
     thankful
    1.52
    roud
    1.50
     grateful
    1.50
     jealous
    1.45
    Act Density 0.017%

    No Known Activations