INDEX
    Explanations

    mentions of favorites and preferences in various contexts

    New Auto-Interp
    Negative Logits
    er
    -0.60
    Ducks
    -0.60
     },
    
    -0.60
    ers
    -0.57
     I
    -0.56
     Ader
    -0.55
    ou
    -0.55
     was
    -0.55
    bilang
    -0.54
     In
    -0.54
    POSITIVE LOGITS
     favorite
    2.05
     favourite
    1.97
     favorites
    1.93
     Favorite
    1.88
    favorite
    1.85
     Favourite
    1.84
    favourite
    1.81
     favourites
    1.80
     Favorites
    1.77
    Favorite
    1.75
    Act Density 0.037%

    No Known Activations