INDEX
    Explanations

    mentions of personal experiences and possessions

    New Auto-Interp
    Negative Logits
    utan
    -0.64
    hov
    -0.63
    rities
    -0.60
    esson
    -0.58
    Lenin
    -0.58
    Created
    -0.57
    hire
    -0.57
    rior
    -0.56
    verages
    -0.56
    namese
    -0.56
    POSITIVE LOGITS
     doors
    1.30
     Pandora
    1.10
     door
    1.04
     Doors
    1.04
     valves
    1.01
     gates
    1.01
     portals
    0.89
     backdoor
    0.84
    doors
    0.80
     pores
    0.79
    Act Density 0.048%

    No Known Activations