INDEX
    Explanations

    words related to psychological states or emotions

    terms related to themes of experimentation and exploitation

    New Auto-Interp
    Negative Logits
    hips
    -0.93
    heny
    -0.89
    ento
    -0.84
    nings
    -0.83
    itu
    -0.83
    sen
    -0.81
    ingham
    -0.77
    iating
    -0.77
    IFE
    -0.76
    yss
    -0.76
    POSITIVE LOGITS
     grab
    0.79
     ploy
    0.77
     cipher
    0.75
     hawk
    0.73
     whore
    0.73
     brake
    0.73
     bunny
    0.73
     glove
    0.72
     hatch
    0.72
     porn
    0.72
    Act Density 0.301%

    No Known Activations