INDEX
    Explanations

    positive adjectives describing things as pleasant or enjoyable

    instances of the word "nice" used in various contexts

    New Auto-Interp
    Negative Logits
    arians
    -0.82
    ogens
    -0.81
    authorized
    -0.78
    ichen
    -0.74
    omics
    -0.73
    arian
    -0.72
    eligible
    -0.71
    inant
    -0.71
    interrupted
    -0.69
    rained
    -0.68
    POSITIVE LOGITS
     gesture
    0.87
     nice
    0.83
     additions
    0.80
    bye
    0.79
     breeze
    0.79
     touches
    0.77
     little
    0.76
     bye
    0.76
     enough
    0.76
    ño
    0.75
    Act Density 0.030%

    No Known Activations