INDEX
    Explanations

    phrases about acting in someone's best interests or moral beliefs

    interest, interests

    New Auto-Interp
    Negative Logits
     covers
    -0.69
     Covers
    -0.67
    cover
    -0.63
     cover
    -0.62
    COVER
    -0.59
    covers
    -0.59
    Covers
    -0.59
     COVER
    -0.58
     Cover
    -0.54
    Cover
    -0.53
    POSITIVE LOGITS
     interests
    1.98
     Interests
    1.70
     interest
    1.69
    interests
    1.66
     INTEREST
    1.55
     Interest
    1.50
    interest
    1.47
    Interest
    1.41
    Interests
    1.32
    INTEREST
    1.28
    Act Density 0.726%

    No Known Activations