INDEX
    Explanations

    references to individuals and interactions with them

    New Auto-Interp
    Negative Logits
    ’s
    -0.21
    has
    -0.17
    shown
    -0.16
     demanded
    -0.16
     deemed
    -0.16
     awaited
    -0.15
     shown
    -0.15
     hasn
    -0.15
     presumed
    -0.15
    's
    -0.15
    POSITIVE LOGITS
     want
    0.58
     think
    0.45
     believe
    0.44
     wish
    0.40
     prefer
    0.39
     know
    0.38
     need
    0.36
     hope
    0.35
    want
    0.35
     expect
    0.35
    Act Density 0.077%

    No Known Activations