INDEX
    Explanations

    instances of pretending or pretending-related actions

    instances of the word "pretend" and its variations

    New Auto-Interp
    Negative Logits
    cedented
    -0.66
    aird
    -0.63
    otype
    -0.61
    aic
    -0.61
    ergy
    -0.60
    ilings
    -0.60
    uterte
    -0.59
    cutting
    -0.58
    atl
    -0.58
     Citation
    -0.58
    POSITIVE LOGITS
     innocence
    1.01
     ignorance
    0.85
     allegiance
    0.78
     otherwise
    0.76
     pas
    0.72
    antly
    0.66
     insanity
    0.65
     forgot
    0.64
     equival
    0.64
    ingly
    0.62
    Act Density 0.055%

    No Known Activations