INDEX
    Explanations

    words related to concepts of control, influence, and manipulation

    words and patterns related to confirmation or agreement

    New Auto-Interp
    Negative Logits
    chev
    -0.85
    itsch
    -0.69
    abouts
    -0.64
    \/\/
    -0.64
    WARD
    -0.61
    itta
    -0.61
     Uriel
    -0.60
     labels
    -0.58
    tarians
    -0.58
    agara
    -0.58
    POSITIVE LOGITS
    ciating
    0.87
    ctions
    0.83
    enment
    0.82
    ctory
    0.81
    uration
    0.80
    rences
    0.78
    nces
    0.76
    ruction
    0.75
    rency
    0.75
    ption
    0.74
    Act Density 0.065%

    No Known Activations