INDEX
    Explanations

    phrases related to punishments or consequences

    New Auto-Interp
    Negative Logits
    nces
    -0.77
     unaccompanied
    -0.74
    xual
    -0.73
    IUM
    -0.69
    ItemImage
    -0.67
    iances
    -0.63
    ibles
    -0.62
     Dian
    -0.62
    owan
    -0.61
     confidentiality
    -0.61
    POSITIVE LOGITS
     forehead
    0.87
     shoulder
    0.87
     cheek
    0.86
     heels
    0.76
     toe
    0.71
     crown
    0.71
     neck
    0.71
     shoulders
    0.69
     os
    0.68
     chest
    0.68
    Act Density 0.120%

    No Known Activations