INDEX
    Explanations

    questions related to experiences and emotions

    questions directed at individuals about their feelings, experiences, or opinions

    New Auto-Interp
    Negative Logits
    seless
    -0.78
    acity
    -0.73
    $$$$
    -0.73
    ospace
    -0.67
     Worse
    -0.67
    cession
    -0.66
     Stupid
    -0.66
    .")
    -0.66
     Godd
    -0.65
    udicrous
    -0.62
    POSITIVE LOGITS
     yourselves
    0.99
     yourself
    0.99
    ?ãĢį
    0.93
    )?
    0.87
     your
    0.76
     experien
    0.76
     autobi
    0.73
    ?:
    0.73
    .?
    0.72
    ?
    0.71
    Act Density 0.261%

    No Known Activations