INDEX
    Explanations

    the word "nice" with varying strengths of activation

    instances of the word "nice" and its variations

    New Auto-Interp
    Negative Logits
    ogens
    -0.87
    arians
    -0.79
    aer
    -0.77
    yrinth
    -0.76
    uality
    -0.75
    ivals
    -0.74
    inant
    -0.73
    WIND
    -0.70
    arian
    -0.70
    omics
    -0.70
    POSITIVE LOGITS
     gesture
    0.82
    ño
    0.82
     little
    0.78
     breeze
    0.77
    bye
    0.76
     tits
    0.76
     nice
    0.76
     touches
    0.76
     additions
    0.73
     bye
    0.73
    Act Density 0.030%

    No Known Activations