INDEX
    Explanations

    adverbs ending in -ingly

    phrases and structures related to deception and pretense

    New Auto-Interp
    Negative Logits
    ourses
    -0.85
    otiation
    -0.81
    ests
    -0.78
    cox
    -0.78
    rer
    -0.75
    atl
    -0.74
    mentioned
    -0.73
    cies
    -0.72
    olphin
    -0.72
    summary
    -0.71
    POSITIVE LOGITS
     invincible
    0.85
     unbeat
    0.84
     kindred
    0.81
     unstoppable
    0.79
     innocuous
    0.76
     benign
    0.75
     harmless
    0.74
     resemblance
    0.72
     spurious
    0.71
     immune
    0.70
    Act Density 0.521%

    No Known Activations