INDEX
    Explanations

    words related to expectations or norms

    phrases related to expectations or societal norms

    New Auto-Interp
    Negative Logits
     Blaz
    -0.56
    Ey
    -0.55
     Kis
    -0.54
     Flavoring
    -0.54
    redo
    -0.53
     Bulgar
    -0.51
     stru
    -0.51
    owitz
    -0.51
     Bohem
    -0.49
     Quote
    -0.49
    POSITIVE LOGITS
     to
    1.06
    to
    0.95
     TO
    0.77
    toc
    0.71
    ered
    0.69
    entious
    0.69
    Disclaimer
    0.67
     "$:/
    0.67
     ta
    0.67
    ALLY
    0.67
    Act Density 0.026%

    No Known Activations