INDEX
    Explanations

    information related to personal preferences or favorites

    the concept of personal favorites

    New Auto-Interp
    Negative Logits
    ulative
    -0.81
    idem
    -0.81
    aping
    -0.79
    ural
    -0.78
    uid
    -0.78
    aton
    -0.77
    asse
    -0.76
    attle
    -0.76
    heed
    -0.75
    lam
    -0.73
    POSITIVE LOGITS
     haun
    0.94
    Favorite
    0.89
     haunt
    0.86
     pokemon
    0.83
     spots
    0.81
     scenes
    0.79
     spot
    0.78
     hobbies
    0.78
     tricks
    0.77
     snack
    0.77
    Act Density 0.048%

    No Known Activations