INDEX
    Explanations

    references to personal experiences and emotional reactions

    New Auto-Interp
    Negative Logits
    ses
    -0.27
    bidden
    -0.21
    pired
    -0.20
    /or
    -0.20
    tempts
    -0.18
    pires
    -0.18
    cribed
    -0.18
    woke
    -0.18
    ductive
    -0.17
    quired
    -0.17
    POSITIVE LOGITS
    orem
    0.51
    oret
    0.34
    oretical
    0.30
    ories
    0.26
    semble
    0.26
    notated
    0.25
    /Set
    0.23
    grily
    0.22
    /Edit
    0.22
    /Sub
    0.21
    Act Density 5.273%

    No Known Activations