INDEX
    Explanations

    phrases related to praising or approval

    New Auto-Interp
    Negative Logits
    ther
    -0.77
    soDeliveryDate
    -0.75
    ueller
    -0.74
    ramid
    -0.74
    abouts
    -0.72
    itamin
    -0.70
    bang
    -0.70
    itol
    -0.66
    claimer
    -0.66
    few
    -0.66
    POSITIVE LOGITS
    ifully
    0.94
    eous
    0.80
     virtues
    0.78
    iful
    0.72
     bravery
    0.69
    fully
    0.67
     exemplary
    0.67
    ously
    0.66
    atory
    0.65
    rous
    0.65
    Act Density 0.107%

    No Known Activations