INDEX
    Explanations

    adjectives describing positive experiences or qualities

    New Auto-Interp
    Negative Logits
     endeavor
    -0.19
     Favorite
    -0.17
     savory
    -0.17
     neighborhood
    -0.17
     maneuvers
    -0.17
     favorite
    -0.16
     neighborhoods
    -0.16
     favors
    -0.16
     behavior
    -0.16
     swath
    -0.16
    POSITIVE LOGITS
     cracking
    0.27
     advert
    0.24
     programme
    0.23
     flavours
    0.22
     contrib
    0.22
    intree
    0.22
     further
    0.21
     proportion
    0.21
     emot
    0.20
     £
    0.19
    Act Density 0.373%

    No Known Activations