INDEX
    Explanations

    personal reflections or opinions expressed through language

    personal reflections on identity and beliefs

    New Auto-Interp
    Negative Logits
    mire
    -0.69
    usky
    -0.66
    ortium
    -0.65
    anyahu
    -0.63
    herry
    -0.63
    flix
    -0.63
    demon
    -0.63
    elsen
    -0.63
    lehem
    -0.61
    Uncommon
    -0.61
    POSITIVE LOGITS
     perce
    0.93
     decisions
    0.92
     choices
    0.90
     interactions
    0.84
     conduct
    0.83
     dealings
    0.79
     behavior
    0.78
     interact
    0.76
     environments
    0.75
     behaviors
    0.74
    Act Density 0.977%

    No Known Activations