INDEX
    Explanations

    positive descriptors, particularly the word "nice" and its variations

    New Auto-Interp
    Negative Logits
    soever
    -0.18
     greatness
    -0.17
    tes
    -0.17
     slightest
    -0.16
    /Branch
    -0.16
    aries
    -0.15
    utes
    -0.15
    lan
    -0.15
    OM
    -0.15
    lu
    -0.15
    POSITIVE LOGITS
    -looking
    0.21
    -sized
    0.18
     little
    0.18
    olson
    0.17
     nice
    0.17
    nice
    0.17
     surpr
    0.16
    енÑĮ
    0.16
     clean
    0.16
     surprises
    0.16
    Act Density 0.021%

    No Known Activations