INDEX
    Explanations

    mentions of personal preferences or favorites

    references to favorite things or preferences

    New Auto-Interp
    Negative Logits
    redits
    -0.73
    ulative
    -0.73
    avis
    -0.73
    lam
    -0.73
    aping
    -0.71
    uid
    -0.70
    DEF
    -0.70
    proof
    -0.70
    usted
    -0.70
    compliance
    -0.69
    POSITIVE LOGITS
     haun
    1.01
     haunt
    0.91
     beverage
    0.87
     hobby
    0.86
     tunes
    0.86
     hobbies
    0.86
     snack
    0.83
     underdog
    0.81
     childhood
    0.81
     meal
    0.78
    Act Density 0.059%

    No Known Activations