INDEX
    Explanations

    mentions of favorites or preferences in various contexts

    New Auto-Interp
    Negative Logits
    er
    -0.79
     I
    -0.69
     n
    -0.66
    ers
    -0.65
     In
    -0.65
     l
    -0.64
     (
    -0.63
     N
    -0.63
    ra
    -0.62
     in
    -0.61
    POSITIVE LOGITS
     favorites
    1.51
     Favorites
    1.50
     favorite
    1.45
     Favorite
    1.43
     favourite
    1.39
     favourites
    1.35
     Favourite
    1.35
    favorites
    1.35
    favorite
    1.35
     FAVORITE
    1.34
    Act Density 0.039%

    No Known Activations