INDEX
    Explanations

    phrases related to dishonesty or deceit

    references to deception or dishonesty

    New Auto-Interp
    Negative Logits
    ugal
    -0.81
    runs
    -0.70
    sylv
    -0.66
    hens
    -0.65
    night
    -0.62
     Flavoring
    -0.61
    orsi
    -0.61
    uries
    -0.59
    icals
    -0.59
    }}}
    -0.59
    POSITIVE LOGITS
     awake
    1.04
    uten
    0.90
     detector
    0.90
     dormant
    0.90
    utenant
    0.78
    pard
    0.76
     silently
    0.76
     quietly
    0.75
     asleep
    0.72
    yss
    0.72
    Act Density 0.026%

    No Known Activations