INDEX
Explanations
mentions of favorites and preferences in various contexts
New Auto-Interp
Negative Logits
er
-0.60
Ducks
-0.60
},
-0.60
ers
-0.57
I
-0.56
Ader
-0.55
ou
-0.55
was
-0.55
bilang
-0.54
In
-0.54
POSITIVE LOGITS
favorite
2.05
favourite
1.97
favorites
1.93
Favorite
1.88
favorite
1.85
Favourite
1.84
favourite
1.81
favourites
1.80
Favorites
1.77
Favorite
1.75
Activations Density 0.037%