INDEX
    Explanations

    prompts asking for opinions, thoughts, influences, or experiences

    questions that inquire about personal experiences or preferences

    New Auto-Interp
    Negative Logits
    geoning
    -0.80
    alde
    -0.77
    het
    -0.77
    wald
    -0.73
    oval
    -0.72
    ammed
    -0.69
    overe
    -0.69
    aea
    -0.68
    avage
    -0.68
    osta
    -0.68
    POSITIVE LOGITS
     favourite
    1.23
     favorite
    1.17
     Favorite
    1.15
    Favorite
    1.12
     hobbies
    1.00
     favourites
    0.94
     favorites
    0.93
     coolest
    0.92
     misconceptions
    0.90
     inspir
    0.87
    Act Density 0.172%

    No Known Activations