INDEX
    Explanations

    verbs related to action or decision making

    phrases or expressions related to causality and consequences

    New Auto-Interp
    Negative Logits
    ector
    -0.82
    allery
    -0.82
    atform
    -0.76
    INAL
    -0.76
    vantage
    -0.74
    eatures
    -0.72
    uid
    -0.70
    ributed
    -0.70
    aic
    -0.69
    ELD
    -0.67
    POSITIVE LOGITS
     hating
    1.54
     worrying
    1.48
     forgetting
    1.45
     messing
    1.40
     pretending
    1.39
     thinking
    1.36
     wanting
    1.35
     liking
    1.35
     wasting
    1.35
     wondering
    1.33
    Act Density 0.449%

    No Known Activations