INDEX
    Explanations

    the word "supposed" followed by a verb or noun, indicating expectations or intentions

    phrases indicating expectations or societal norms

    New Auto-Interp
    Negative Logits
    tex
    -0.66
    Ey
    -0.59
     Bohem
    -0.58
     Flavoring
    -0.58
    tein
    -0.57
     Blaz
    -0.57
     Splash
    -0.56
    sv
    -0.54
     Inventory
    -0.54
    lves
    -0.54
    POSITIVE LOGITS
    ALLY
    0.74
    ILY
    0.73
    ered
    0.68
    erest
    0.65
     "$:/
    0.64
    ivalent
    0.63
    escription
    0.63
    ich
    0.62
     to
    0.62
     bene
    0.61
    Act Density 0.043%

    No Known Activations