INDEX
    Explanations

    positive adjectives related to pleasant experiences

    the word "nice" and its variations in different contexts

    New Auto-Interp
    Negative Logits
    rained
    -0.82
    arian
    -0.80
    arians
    -0.79
    aer
    -0.76
    ogens
    -0.75
    inant
    -0.75
    wear
    -0.74
    uilding
    -0.73
    arers
    -0.73
    alone
    -0.70
    POSITIVE LOGITS
     nice
    0.88
     additions
    0.86
     enough
    0.84
     reception
    0.81
     bye
    0.78
     touches
    0.77
     consolation
    0.77
     bonus
    0.76
     breeze
    0.74
     fluffy
    0.73
    Act Density 0.014%

    No Known Activations